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To my parents, for everything 


Preface to the First Edition 


This text originated from the lecture notes I gave teaching the honours undergraduate- 
level real analysis sequence at the University of California, Los Angeles, in 2003. 
Among the undergraduates here, real analysis was viewed as being one of the most 
difficult courses to learn, not only because of the abstract concepts being introduced 
for the first time (e.g., topology, limits, measurability, etc.), but also because of the 
level of rigour and proof demanded of the course. Because of this perception of 
difficulty, one was often faced with the difficult choice of either reducing the level 
of rigour in the course in order to make it easier, or to maintain strict standards and 
face the prospect of many undergraduates, even many of the bright and enthusiastic 
ones, struggling with the course material. 

Faced with this dilemma, I tried a somewhat unusual approach to the subject. 
Typically, an introductory sequence in real analysis assumes that the students are 
already familiar with the real numbers, with mathematical induction, with elementary 
calculus, and with the basics of set theory, and then quickly launches into the heart 
of the subject, for instance the concept of a limit. Normally, students entering this 
sequence do indeed have a fair bit of exposure to these prerequisite topics, though 
in most cases the material is not covered in a thorough manner. For instance, very 
few students were able to actually define a real number, or even an integer, properly, 
even though they could visualize these numbers intuitively and manipulate them 
algebraically. This seemed to me to be a missed opportunity. Real analysis is one 
of the first subjects (together with linear algebra and abstract algebra) that a student 
encounters, in which one truly has to grapple with the subtleties of a truly rigorous 
mathematical proof. As such, the course offered an excellent chance to go back to 
the foundations of mathematics, and in particular the opportunity to do a proper and 
thorough construction of the real numbers. 

Thus the course was structured as follows. In the first week, I described some 
well-known “paradoxes” in analysis, in which standard laws of the subject (e.g., 
interchange of limits and sums, or sums and integrals) were applied in a non-rigorous 
way to give nonsensical results such as 0 = 1. This motivated the need to go back to 
the very beginning of the subject, even to the very definition of the natural numbers, 
and check all the foundations from scratch. For instance, one of the first homework 
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assignments was to check (using only the Peano axioms) that addition was associative 
for natural numbers (i.e., that (a + b) +c = a+ (b-+c) for all natural numbers 
a, b, c: see Exercise 2.2.1). Thus even in the first week, the students had to write 
rigorous proofs using mathematical induction. After we had derived all the basic 
properties of the natural numbers, we then moved on to the integers (initially defined 
as formal differences of natural numbers); once the students had verified all the basic 
properties of the integers, we moved on to the rationals (initially defined as formal 
quotients of integers); and then from there we moved on (via formal limits of Cauchy 
sequences) to the reals. Around the same time, we covered the basics of set theory, 
for instance demonstrating the uncountability of the reals. Only then (after about ten 
lectures) did we begin what one normally considers the heart of undergraduate real 
analysis—limits, continuity, differentiability, and so forth. 

The response to this format was quite interesting. In the first few weeks, the 
students found the material very easy on a conceptual level, as we were dealing 
only with the basic properties of the standard number systems. But on an intellectual 
level it was very challenging, as one was analyzing these number systems from a 
foundational viewpoint, in order to rigorously derive the more advanced facts about 
these number systems from the more primitive ones. One student told me how diffi- 
cult it was to explain to his friends in the non-honours real analysis sequence (a) 
why he was still learning how to show why all rational numbers are either posi- 
tive, negative, or zero (Exercise 4.2.4), while the non-honours sequence was already 
distinguishing absolutely convergent and convergent series, and (b) why, despite this, 
he thought his homework was significantly harder than that of his friends. Another 
student commented to me, quite wryly, that while she could obviously see why one 
could always divide a natural number n into a positive integer g to give a quotient 
a and a remainder r less than g (Exercise 2.3.5), she still had, to her frustration, 
much difficulty in writing down a proof of this fact. (I told her that later in the 
course she would have to prove statements for which it would not be as obvious 
to see that the statements were true; she did not seem to be particularly consoled 
by this.) Nevertheless, these students greatly enjoyed the homework, as when they 
did perservere and obtain a rigorous proof of an intuitive fact, it solidified the link 
in their minds between the abstract manipulations of formal mathematics and their 
informal intuition of mathematics (and of the real world), often in a very satisfying 
way. By the time they were assigned the task of giving the infamous “epsilon and 
delta” proofs in real analysis, they had already had so much experience with formal- 
izing intuition, and in discerning the subtleties of mathematical logic (such as the 
distinction between the “for all” quantifier and the “there exists” quantifier), that 
the transition to these proofs was fairly smooth, and we were able to cover material 
both thoroughly and rapidly. By the tenth week, we had caught up with the non- 
honours class, and the students were verifying the change of variables formula for 
Riemann-Stieltjes integrals, and showing that piecewise continuous functions were 
Riemann integrable. By the conclusion of the sequence in the twentieth week, we 
had covered (both in lecture and in homework) the convergence theory of Taylor 
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and Fourier series, the inverse and implicit function theorem for continuously differ- 
entiable functions of several variables, and established the dominated convergence 
theorem for the Lebesgue integral. 

In order to cover this much material, many of the key foundational results were 
left to the student to prove as homework; indeed, this was an essential aspect of the 
course, as it ensured the students truly appreciated the concepts as they were being 
introduced. This format has been retained in this text; the majority of the exercises 
consist of proving lemmas, propositions and theorems in the main text. Indeed, I 
would strongly recommend that one do as many of these exercises as possible—and 
this includes those exercises proving “obvious” statements—if one wishes to use this 
text to learn real analysis; this is not a subject whose subtleties are easily appreciated 
just from passive reading. Most of the chapter sections have a number of exercises, 
which are listed at the end of the section. 

To the expert mathematician, the pace of this book may seem somewhat slow, 
especially in early chapters, as there is a heavy emphasis on rigour (except for those 
discussions explicitly marked “Informal’’), and justifying many steps that would ordi- 
narily be quickly passed over as being self-evident. The first few chapters develop (in 
painful detail) many of the “obvious” properties of the standard number systems, for 
instance that the sum of two positive real numbers is again positive (Exercise 5.4.1), 
or that given any two distinct real numbers, one can find rational number between 
them (Exercise 5.4.5). In these foundational chapters, there is also an emphasis on 
non-circularity—not using later, more advanced results to prove earlier, more prim- 
itive ones. In particular, the usual laws of algebra are not used until they are derived 
(and they have to be derived separately for the natural numbers, integers, rationals, 
and reals). The reason for this is that it allows the students to learn the art of abstract 
reasoning, deducing true facts from a limited set of assumptions, in the friendly and 
intuitive setting of number systems; the payoff for this practice comes later, when one 
has to utilize the same type of reasoning techniques to grapple with more advanced 
concepts (e.g., the Lebesgue integral). 

The text here evolved from my lecture notes on the subject, and thus is very much 
oriented towards a pedagogical perspective; much of the key material is contained 
inside exercises, and in many cases I have chosen to give a lengthy and tedious, but 
instructive, proof instead of a slick abstract proof. In more advanced textbooks, the 
student will see shorter and more conceptually coherent treatments of this material, 
and with more emphasis on intuition than on rigour; however, I feel it is important to 
know how to do analysis rigorously and “by hand” first, in order to truly appreciate 
the more modern, intuitive and abstract approach to analysis that one uses at the 
graduate level and beyond. 

The exposition in this book heavily emphasizes rigour and formalism; however 
this does not necessarily mean that lectures based on this book have to proceed the 
same way. Indeed, in my own teaching I have used the lecture time to present the 
intuition behind the concepts (drawing many informal pictures and giving examples), 
thus providing a complementary viewpoint to the formal presentation in the text. 
The exercises assigned as homework provide an essential bridge between the two, 
requiring the student to combine both intuition and formal understanding together 
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in order to locate correct proofs for a problem. This I found to be the most difficult 
task for the students, as it requires the subject to be genuinely learnt, rather than 
merely memorized or vaguely absorbed. Nevertheless, the feedback I received from 
the students was that the homework, while very demanding for this reason, was also 
very rewarding, as it allowed them to connect the rather abstract manipulations of 
formal mathematics with their innate intuition on such basic concepts as numbers, 
sets, and functions. Of course, the aid of a good teaching assistant is invaluable in 
achieving this connection. 

With regard to examinations for a course based on this text, I would recommend 
either an open-book, open-notes examination with problems similar to the exercises 
given in the text (but perhaps shorter, with no unusual trickery involved), or else 
a take-home examination that involves problems comparable to the more intricate 
exercises in the text. The subject matter is too vast to force the students to memorize 
the definitions and theorems, so I would not recommend a closed-book examination, 
or an examination based on regurgitating extracts from the book. (Indeed, in my own 
examinations I gave a supplemental sheet listing the key definitions and theorems 
which were relevant to the examination problems.) Making the examinations similar 
to the homework assigned in the course will also help motivate the students to work 
through and understand their homework problems as thoroughly as possible (as 
opposed to, say, using flash cards or other such devices to memorize material), which 
is good preparation not only for examinations but for doing mathematics in general. 

Some of the material in this textbook is somewhat peripheral to the main theme 
and may be omitted for reasons of time constraints. For instance, as set theory is 
not as fundamental to analysis as are the number systems, the chapters on set theory 
(Chapters 3, 8) can be covered more quickly and with substantially less rigour, or be 
given as reading assignments. The appendices on logic and the decimal system are 
intended as optional or supplemental reading and would probably not be covered in 
the main course lectures; the appendix on logic is particularly suitable for reading 
concurrently with the first few chapters. Also, Chapter 5 (on Fourier series) is not 
needed elsewhere in the text and can be omitted. 

For reasons of length, this textbook has been split into two volumes. The first 
volume is slightly longer, but can be covered in about thirty lectures if the peripheral 
material is omitted or abridged. The second volume refers at times to the first, but can 
also be taught to students who have had a first course in analysis from other sources. 
It also takes about thirty lectures to cover. 

I am deeply indebted to my students, who over the progression of the real anal- 
ysis course corrected several errors in the lectures notes from which this text is 
derived, and gave other valuable feedback. I am also very grateful to the many 
anonymous referees who made several corrections and suggested many impor- 
tant improvements to the text. I also thank Adam, James Ameril, Quentin Batista, 
Biswaranjan Behara, José Antonio Lara Benitez, Dingjun Bian, Petrus Bianchi, 
Phillip Blagoveschensky, Tai-Danae Bradley, Brian, Eduardo Buscicchio, Carlos, 
cebismellim, Matheus Silva Costa, Gonzales Castillo Cristhian, Ck, William Deng, 
Kevin Doran, Lorenzo Dragani, EO, Florian, Gyao Gamm, Evangelos Georgiadis, 
Aditya Ghosh, Elie Goudout, Ti Gong, Ulrich Groh, Gékhan Giiglii, Yaver Gulusoy, 
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Christian Gz., Kyle Hambrook, Minyoung Jeong, Bart Kleijngeld, Erik Koelink, Brett 
Lane, David Latorre, Matthis Lehmkihler, Bin Li, Percy Li, Ming Li, Mufei Li, Zijun 
Liu, Rami Luisto, Jason M., Manoranjan Majji, Mercedes Mata, Simon Mayer, Geoff 
Mess, Pieter Naaijkens, Vineet Nair, Jorge Pefia- Vélez, Cristina Pereyra, Huaying 
Qiu, David Radnell, Tim Reijnders, Issa Rice, Eric Rodriquez, Pieter Roffelsen, 
Luke Rogers, Feras Saad, Gabriel Salmer6én, Vijay Sarthak, Leopold Schlicht, Marc 
Schoolderman, SkysubO, Rainer aus dem Spring, Sundar, Rafat Szlendak, Karim 
Taya, Chaitanya Tappu, Winston Tsai, Kent Van Vels, Andrew Verras, Murtaza 
Wani, Daan Wanrooy, John Waters, Yandong Xiao, Sam Xu, Xueping, Hongjiang 
Ye, Luging Ye, Muhammad Atif Zaheer, Zelin, and the students of Math 401/501 and 
Math 402/502 at the University of New Mexico for corrections to the first, second, 
and third editions. 


Terence Tao 


Preface to Subsequent Editions 


Since the publication of the first edition, many students and lecturers have commu- 
nicated a number of minor typos and other corrections to me. There was also some 
demand for a hardcover edition of the texts. Because of this, the publishers and I 
have decided to incorporate the corrections and issue a hardcover second edition of 
the textbooks. The layout, page numbering, and indexing of the texts have also been 
changed; in particular the two volumes are now numbered and indexed separately. 
However, the chapter and exercise numbering, as well as the mathematical content, 
remains the same as the first edition, and so the two editions can be used more or 
less interchangeably for homework and study purposes. 

The third edition contains a number of corrections that were reported for the 
second edition, together with a few new exercises, but are otherwise essentially the 
same text. The fourth edition similarly incorporates a large number of additional 
corrections reported since the release of the third edition, as well as some additional 
exercises. 


Los Angeles, USA Terence Tao 
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Chapter 1 ®) 
Metric Spaces ra 


1.1 Definitions and Examples 


In Definition 6.1.5 we defined what it meant for a sequence (x,)°2,, of real numbers 
to converge to another real number x; indeed, this meant that for every ¢ > 0, there 
exists an N > m such that |x — x,| < © for all n > N. When this is the case, we 
write limy—+o9 Xn) = X. 

Intuitively, when a sequence (x,)°°,,, converges to a limit x, this means that 
somehow the elements x, of that sequence will eventually be as close to x as one 
pleases. One way to phrase this more precisely is to introduce the distance function 
d(x, y) between tworeal numbers by d(x, y) := |x — y|. (Thus forinstanced(3, 5) = 
2, d(5, 3) = 2, and d(3, 3) = 0.) Then we have 


Lemma 1.1.1 Let (x,)°°,,, be a sequence of real numbers, and let x be another real 


n=m 
number. Then (xy)? converges to x if and only if limp oo d(Xn, X) = 0. 


Proof See Exercise 1.1.1. 


One would now like to generalize this notion of convergence, so that one can take 
limits not just of sequences of real numbers, but also sequences of complex numbers, 
or sequences of vectors, or sequences of matrices, or sequences of functions, even 
sequences of sequences. One way to do this is to redefine the notion of convergence 
each time we deal with a new type of object. As you can guess, this will quickly get 
tedious. A more efficient way is to work abstractly, defining a very general class of 
spaces—which includes such standard spaces as the real numbers, complex numbers, 
vectors, etc.—and define the notion of convergence on this entire class of spaces at 
once. (A space is just the set of all objects of a certain type—the space of all real num- 
bers, the space of all3 x 3 matrices, etc. Mathematically, there is not much distinction 
between a space and a set, except that spaces tend to have much more structure than 
what a random set would have. For instance, the space of real numbers comes with 
operations such as addition and multiplication, while a general set would not.) 

It turns out that there are two very useful classes of spaces which do the job. The 
first class is that of metric spaces, which we will study here. There is a more general 
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2 1 Metric Spaces 


class of spaces, called topological spaces, which is also very important, but we will 
only deal with this generalization briefly, in Sect. 2.5. 

Roughly speaking, a metric space is any space X which has a concept of distance 
d(x, y)—and this distance should behave in a reasonable manner. More precisely, 
we have 


Definition 1.1.2 (Metric spaces) A metric space (X,d) is a space X of objects 
(called points), together with a distance function or metric d : X x X — [0, +00), 
which associates to each pair x, y of points in X anon-negative real number d(x, y) > 
0. Furthermore, the metric must satisfy the following four axioms: 


(a) For any x € X, we have d(x, x) = 0. 

(b) (Positivity) For any distinct x, y © X, we have d(x, y) > 0. 

(c) (Symmetry) For any x, y € X, we have d(x, y) = d(y, x). 

(d) (Triangle inequality) For any x, y, z © X, we have d(x, z) < d(x, y) + d(y, Z). 


In many cases it will be clear what the metric d is, and we shall abbreviate (X, d) as 
just X. 


Remark 1.1.3 The conditions (a) and (b) can be rephrased as follows: for any x, y € 
X we have d(x, y) = 0 if and only if x = y. (Why is this equivalent to (a) and (b)?) 


Example 1.1.4 (The real line) Let R be the real numbers, and let d: R x R= 
[0, 00) be the metric d(x, y):=|x — y| mentioned earlier. Then (R, d) is a metric 
space (Exercise 1.1.2). We refer to d as the standard metric on R, and if we refer 
to R as a metric space, we assume that the metric is given by the standard metric d 
unless otherwise specified. 


Example 1.1.5 (Induced metric spaces) Let (X,d) be any metric space, and let Y 
be a subset of X. Then we can restrict the metric function d : X x X — [0, +00) to 
the subset Y x Y of X x X to create a restricted metric function d|y,y :Y x Y > 
[0, +00) of Y; this is known as the metric on Y induced by the metric d on X. The 
pair (Y, dlyxy) is a metric space (Exercise 1.1.4) and is known the subspace of (X, d) 
induced by Y. Thus for instance the metric on the real line in the previous example 
induces a metric space structure on any subset of the reals, such as the integers Z, or 
an interval [a, b]. 


Example 1.1.6 (Euclidean spaces) Letn > 1 be a natural number, and let R” be the 
space of n-tuples of real numbers: 


R" = {(x1, X2,..-,Xn) 1 X1,-.--5Xn € R}. 


We define the Euclidean metric (also called the [* metric) dp: R" x R" > R by 


dp((x1, Lig Ons (v1, sees Yn) =JV (1 ~ yy)? + weet (Xn ~ Yn)? 


7 1/2 
= (de >) 
i=l 


1.1 Definitions and Examples 3 


Thus for instance, if n = 2, then dp((1, 6), (4, 2)) = V3? + 44 =5. This metric 
corresponds to the geometric distance between the two points (x1, %2,...,%Xn), 
(V1, Y2,--+5 Yn) aS given by Pythagoras’ theorem. (We remark however that while 
geometry does give some very important examples of metric spaces, it is possible 
to have metric spaces which have no obvious geometry whatsoever. Some examples 
are given below.) The verification that (R”, d) is indeed a metric space can be seen 
geometrically (for instance, the triangle inequality now asserts that the length of 
one side of a triangle is always less than or equal to the sum of the lengths of the 
other two sides), but can also be proven algebraically (see Exercise 1.1.6). We refer 
to (R", dz) as the Euclidean space of dimension n. Extending the convention from 
Example 1.1.4, if we refer to R” as a metric space, we assume that the metric is given 
by the Euclidean metric unless otherwise specified. 


Example 1.1.7 (Taxicab metric) Again let n > 1, and let R” be as before. But now 
we use a different metric d)1, the so-called taxicab metric (or I' metric), defined by 


Uy ((X1, X25 + +65 Xn), (V1, Yas +++) Vn) = [Xr — yi] + + [Xen — yal 
n 
= > [xi — yil- 
i=1 


Thus for instance, if n = 2, then dj ((1, 6), (4, 2)) = 3+ 4 = 7. This metric is called 
the taxicab metric, because it models the distance a taxicab would have to traverse 
to get from one point to another if the cab was only allowed to move in cardinal 
directions (north, south, east, west) and not diagonally. As such it is always at least 
as large as the Euclidean metric, which measures distance “as the crow flies”, as it 
were. We claim that the space (R”, dj) is also a metric space (Exercise 1.1.7). The 
metrics are not quite the same, but we do have the inequalities 


dp (x,y) <di(x, y) < Vndp(x, y) 1) 
for all x, y (see Exercise 1.1.8). 


Remark 1.1.8 The taxicab metric is useful in several places, for instance in the theory 
of error correcting codes. A string of 1 binary digits can be thought of as an element of 
R", for instance the binary string 10010 can be thought of as the point (1, 0, 0, 1, 0) 
in R>. The taxicab distance between two binary strings is then the number of bits in 
the two strings which do not match, for instance d; (10010, 10101) = 3. The goal 
of error-correcting codes is to encode each piece of information (e.g., a letter of the 
alphabet) as a binary string in such a way that all the binary strings are as far away 
in the taxicab metric from each other as possible; this minimizes the chance that any 
distortion of the bits due to random noise can accidentally change one of the coded 
binary strings to another and also maximizes the chance that any such distortion can 
be detected and correctly repaired. 
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Example 1.1.9 (Sup norm metric) Again let > 1, and let R” be as before. But now 
we use a different metric djx, the so-called sup norm metric (or 1* metric), defined 
by 

dix (x1, X2,---5 Xn), (1, V2, 2065 Yn)) = sup{|x; oa yi : 1 < i < n}. 


Thus for instance, if n = 2, then dj~((1, 6), (4, 2)) = sup(3, 4) = 4. The space 
(R”, dj~) is also a metric space (Exercise 1.1.9) and is related to the / > metric by the 
inequalities 


1 
Wri Y) S dix(x, y) < dp(x, y) (1.2) 


for all x, y (see Exercise 1.1.10). 


Remark 1.1.10 The I', 1’, and 1® metrics are special cases of the more general 1? 
metrics, where p € [1, +00], but we will not discuss these more general metrics in 
this text. 


Example 1.1.11 (Discrete metric) Let X be an arbitrary set (finite or infinite), 
and define the discrete metric daisc by setting daisc(x, y):=0 when x = y, and 
daisc(X, y):= 1 when x # y. Thus, in this metric, all points are equally far apart. 
The space (X, daisc) is a metric space (Exercise 1.1.11). Thus every set X has at least 
One metric on it. 


Example 1.1.12 (Geodesics) (Informal) Let X be the sphere {(x, y, z) € Ro ix? + 
y? + 2? = 1}, and let d((x, y, z), (x’, y’, ’)) be the length of the shortest curve in 
X which starts at (x, y, z) and ends at (x’, y’, z’). (This curve turns out to be an arc 
of a great circle; we will not prove this here, as it requires calculus of variations, 
which is beyond the scope of this text.) This makes X into a metric space; the reader 
should be able to verify (without using any geometry of the sphere) that the triangle 
inequality is more or less automatic from the definition. 


Example 1.1.13 (Shortest paths) (Informal) Examples of metric spaces occur all 
the time in real life. For instance, X could be all the computers currently connected 
to the internet, and d(x, y) is the shortest number of connections it would take for 
a packet to travel from computer x to computer y; for instance, if x and y are not 
directly connected, but are both connected to z, then d(x, y) = 2. Assuming that all 
computers in the internet can ultimately be connected to all other computers (so that 
d(x, y) is always finite), then (X, d) is a metric space (why?). Games such as “six 
degrees of separation” are also taking place in a similar metric space (what is the 
space, and what is the metric, in this case?). Or, X could be a major city, and d(x, y) 
could be the shortest time it takes to drive from x to y (although this space might not 
satisfy axiom (c) in real life!). 


Now that we have metric spaces, we can define convergence in these spaces. 


Definition 1.1.14 (Convergence of sequences in metric spaces) Let m be an integer, 
(X, d) be a metric space, and let (x™)° be a sequence of points in X (i.e., for 


n=m 
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every natural number n > m, we assume that x” is an element of X). Let x be a 
point in X. We say that (x)°<.,, converges to x with respect to the metric d, if and 
only if the limit lim,_,.. d(x, x) exists and is equal to 0. In other words, (x)? 
converges to x with respect to d if and only if for every ¢ > 0, there exists an N > m 


such that d(x, x) < ¢ for all n > N. (Why are these two definitions equivalent?) 


Remark 1.1.15 In view of Lemma 1.1.1 we see that this definition generalizes our 
existing notion of convergence of sequences of real numbers. In many cases, it is 
obvious what the metric d is, and so we shall often just say “(x))°,,, converges to 
x” instead of “(x)°°_,, converges to x with respect to the metric d” when there is 


no chance of confusion. We also sometimes write “x — x asn > oo” instead. 


Remark 1.1.16 There is nothing special about the superscript n in the above defi- 
nition; it is a dummy variable. Saying that (x“))°°,,, converges to x is exactly the 
same statement as saying that (x ae converges to x, for example; and sometimes 
it is convenient to change superscripts, for instance if the variable n is already being 
used for some other purpose. Similarly, it is not necessary for the sequence x”) to be 
denoted using the superscript (7); the above definition is also valid for sequences x,, 
or functions f(n), or indeed of any expression which depends on 7 and takes values 
in X. Finally, from Exercises 6.1.3 and 6.1.4 we see that the starting point m of the 
sequence is unimportant for the purposes of taking limits; if (x“)°°.,, converges to 


n=m 
x, then (x))®_, also converges to x for any m! > m. 


Example 1.1.17 We work in the Euclidean space R* with the standard Euclidean 
metric dp. Let (x\))%, denote the sequence x :=(1/n, 1/n) in R?, i.e., we are 


considering the sequence (1, 1), (1/2, 1/2), (1/3, 1/3), .... Then this sequence con- 
verges to (0, 0) with respect to the Euclidean metric dj, since 


rf 4 pi 
lim dp(x™, (0,0)) = lim = += = lim v2 =0. 
noo n—->oo n n n>o n 


The sequence (™)22, also converges to (0, 0) with respect to the taxicab metric 
dy, since 
as a oo re 
lim dn(x”’, (0,0)) = lim —+ —= lim —- =0. 
n>0oo n>on n noo n 
Similarly the sequence converges to (0, 0) in the sup norm metric dj (why?). How- 
ever, the sequence (x Lh eal does not converge to (0, 0) in the discrete metric dgisc, 


since 
lim dgise(x, (0,0)) = lim 1=1 40. 
n—->oo noo 


Thus the convergence of a sequence can depend on what metric one uses.! 


' For a somewhat whimsical real-life example, one can give a city an “automobile metric”, with 
d(x, y) defined as the time it takes for a car to drive from x to y, or a “pedestrian metric”, where 
d(x, y) is the time it takes to walk on foot from x to y. (Let us assume for sake of argument that 
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In the case of the above four metrics—Euclidean, taxicab, sup norm, and 
discrete—it is in fact rather easy to test for convergence. 


Proposition 1.1.18 (Equivalence of / ' P?, 1°) Let R" be a Euclidean space, and let 

Ga). be a sequence of points inR". We write x = a, pee Gaga”). extor 

f Sls 2y000g-N; - € Ris the jth co-ordinate of x ER". Let x = (x1,...,Xn) 

be a point in R". Then the following four statements are equivalent: 

(a) («® Yeon Converges to x with respect to the Euclidean metric dp. 

(b) («® Vreum Converges to x with respect to the taxicab metric dy. 

(c) (x® Vrum Converges to x with respect to the sup norm metric dj~. 

(d) For every 1 < j <n, the sequence Ge. converges to x;. (Notice that this 
is a sequence of real numbers, not of points in R".) 


Proof See Exercise 1.1.12. 


In other words, a sequence converges in the Euclidean, taxicab, or sup norm 
metric if and only if each of its components converges individually. Because of 
the equivalence of (a), (b), and (c), we say that the Euclidean, taxicab, and sup 
norm metrics on R” are equivalent. (There are infinite-dimensional analogues of the 
Euclidean, taxicab, and sup norm metrics which are not equivalent, see for instance 
Exercise 1.1.15.) 

For the discrete metric, convergence is much rarer: the sequence must be eventu- 
ally constant in order to converge. 


Proposition 1.1.19 (Convergence in the discrete metric) Let X be any set, and let 
daise be the discrete metric on X. Let a. be a sequence of points in X, and let 
x be a point in X. Then (x), converges to x with respect to the discrete metric 


daisc if and only if there exists an N > m such that x) = x foralln => N. 


Proof See Exercise 1.1.13. 


We now prove a basic fact about converging sequences; they can only converge 
to at most one point at a time. 


Proposition 1.1.20 (Uniqueness of limits) Let (X,d) be a metric space, and let 
(x) | be a sequence in X. Suppose that there are two points x, x' € X such that 


n=m 


aM)x_ converges to x with respect to d, and (x ro also converges to x’ with 


respect to d. Then we have x = x". 


Proof See Exercise 1.1.14. 


Because of the above proposition, it is safe to introduce the following notation: 
if (x), converges to x in the metric d, then we write d — lim, 9 x = x, or 
simply lim,_,) x“ = x when there is no confusion as to what d is. For instance, in 
the example of ¢, 1), we have 


these metrics are symmetric, though this is not always the case in real life.) One can easily imagine 
examples where two points are close in one metric but not another. 
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: 1 1 : 11 
dp — lim (. ) = dp — lim (. ) = (0,0), 
n—->0Oo nn n> Oo non 


but daisc — limn—oo(4, 1) is undefined. Thus the meaning of d — limy_,9. x can 
depend on what d is; however Proposition 1.1.20 assures us that once d is fixed, there 
can be at most one value of d — limy_, 4. x. (Of course, it is still possible that this 
limit does not exist; some sequences are not convergent.) Note that by Lemma 1.1.1, 
this definition of limit generalizes the notion of limit in Definition 6.1.8. 


Remark 1.1.21 It is possible for a sequence to converge to one point using one 
metric, and another point using a different metric, although such examples are usu- 
ally quite artificial. For instance, let X :=[0, 1], the closed interval from 0 to 1. 
Using the usual metric d, we have d — limy-,o0 i = 0. But now suppose we “swap” 
the points 0 and | in the following manner. Let f: [0,1] — [0, 1] be the func- 
tion defined by f(0):=1, f(1):=0, and f(x):=-x for all x € (0, 1), and then 
define d’(x, y):=d(f(x), f(y)). Then (X, d’) is still a metric space (why?), but 
now d! — limn-+oo 4 = 1. Thus changing the metric on a space can greatly affect the 
nature of convergence (also called the topology) on that space; see Sect.2.5 for a 
further discussion of topology. 


— Exercises — 
Exercise 1.1.1 Prove Lemma 1.1.1. 


Exercise 1.1.2 Show that the real line with the metric d(x, y) :=|x — y| is indeed 
a metric space. (Hint: you may wish to review your proof of Proposition 4.3.3.) 


Exercise 1.1.3. Let X be a set, and let d : X x X — [0, co) be a function. 


(a) Give an example of a pair (X, d) which obeys axioms (bcd) of Definition 1.1.2, 
but not (a). (Hint: modify the discrete metric.) 

(b) Give an example of a pair (X, d) which obeys axioms (acd) of Definition 1.1.2, 
but not (b). 

(c) Give an example of a pair (X, d) which obeys axioms (abd) of Definition 1.1.2, 
but not (c). 

(d) Give an example of a pair (X, d) which obeys axioms (abc) of Definition 1.1.2, 
but not (d). (Hint: try examples where X is a finite set.) 


Exercise 1.1.4 Show that the pair (Y, d|y,y) defined in Example 1.1.5 is indeed a 
metric space. 


Exercise 1.1.5 Letn > 1, and let aj, a2,..., a, and bj, bo, ..., b, be real numbers. 
Verify the identity 


n 


i 2 n n n 
(x at + 5a; — aby = e “) ry 
i=1 


i=l i=l j=l j=l 
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and conclude the Cauchy—Schwarz inequality 


1/2 


n 1/2 n 
< (x “| ye}. « (1.3) 
i=l j=l 


n 
y ajb; 
i=l 


Then use the Cauchy—Schwarz inequality to prove the triangle inequality 


1/2 


nt 1/2 * 1/2 n 
(6 + 63) < (>: “) ss (a 
i=l j=l 


i=1 


Exercise 1.1.6 Show that (R”, dj) in Example 1.1.6 is indeed a metric space. (Hint: 
use Exercise 1.1.5.) 


Exercise 1.1.7 Show that the pair (R”, dj) in Example 1.1.7 is indeed a metric 
space. 


Exercise 1.1.8 Prove the two inequalities in (1.1). (Hint: For the first inequality, 
square both sides. For the second inequality, use Exercise (1.1.5).) 


Exercise 1.1.9 Show that the pair (R”, dj) in Example 1.1.9 is indeed a metric 
space. 


Exercise 1.1.10 Prove the two inequalities in (1.2). 


Exercise 1.1.11 Show that the discrete metric (X, dyisc) in Example 1.1.11 is indeed 
a metric space. 


Exercise 1.1.12 Prove Proposition 1.1.18. 
Exercise 1.1.13. Prove Proposition 1.1.19. 


Exercise 1.1.14 Prove Proposition 1.1.20. (Hint: modify the proof of Proposition 
6.1.7.) 


Exercise 1.1.15 Let 
oe) 
X:= (An): ~ |an| < ~| 


n=0 


be the space of absolutely convergent sequences. Define the /'! and /© metrics on 
this space by 


[o.e) 
dy: ((an)p29; (Bn)g29) = ln — Dn 
n=0 


yx ((An nao» (Onno) = SUP |an — Dn |. 
neN 
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Show that these are both metrics on X, but show that there exist sequences 
x), x... of elements of X (i.e., sequences of sequences) which are convergent 
with respect to the dj. metric but not with respect to the d;1 metric. Conversely, show 
that any sequence which converges in the dj: metric automatically converges in the 
dj metric. 


Exercise 1.1.16 Let (x,,)°°, and (y,)?°., be two sequences in a metric space (X, d). 
Suppose that (x,,)°° ; converges to a point x € X, and (y,)°°, converges to a point 


y € X. Show that lim,_,.5 d(%, Yn) = d(x, y). (Hint: use the triangle inequality 
several times.) 


1.2 Some Point-Set Topology of Metric Spaces 


Having defined the operation of convergence on metric spaces, we now define a 
couple other related notions, including that of open set, closed set, interior, exte- 
rior, boundary, and adherent point. The study of such notions is known as point-set 
topology, which we shall return to in Sect. 2.5. 

We first need the notion of a metric ball, or more simply a bail. 


Definition 1.2.1 (Balls) Let (X, d) be a metric space, let xp be a point in X, and let 
r > 0. We define the ball Bix a)(xo, r) in X, centered at xo, and with radius r, in the 
metric d, to be the set 


Bcx,a) (xo, 7) = {x € X : d(x, x0) <r}. 


When it is clear what the metric space (X, d) is, we shall abbreviate Byx @) (Xo, 1) as 
just B(xo, r). 


Example 1.2.2 In R* with the Euclidean metric dy», the ball Bir?,d) ((O, 0), 1) is the 
open disc 
Bewda)((0, 0), 1) = {(x, y) € R* sx? + y? < I. 
However, if one uses the taxicab metric dj: instead, then we obtain a diamond: 
Bawa), 0), 1) = {(x, y) € R?: [x] + ly < 1. 
If we use the discrete metric, the ball is now reduced to a single point: 


Ber? da.) (0, 0), 1) = {(0, 0)}, 


although if one increases the radius to be larger than 1, then the ball now encompasses 
all of R*. (Why?) 


Example 1.2.3 In R with the usual metric d, the open interval (3, 7) is also the 
metric ball Byr.a)(5, 2). 
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Remark 1.2.4 Note that the smaller the radius r, the smaller the ball B(x, r). How- 
ever, B(xo, r) always contains at least one point, namely the center x9, as long as r 
stays positive, thanks to Definition 1.1.2(a). (We don’t consider balls of zero radius 
or negative radius since they are rather boring, being just the empty set.) 


Using metric balls, one can now take a set EF in a metric space X and classify 
three types of points in X: interior, exterior, and boundary points of E. 


Definition 1.2.5 (Interior, exterior, boundary) Let (X,d) be a metric space, let E 
be a subset of X, and let xo be a point in X. We say that xo is an interior point of E 
if there exists a radius r > 0 such that B(xo, r) C E. We say that xo is an exterior 
point of E if there exists a radius r > 0 such that B(xo, r) N E = G. We say that xo 
is a boundary point of E if it is neither an interior point nor an exterior point of E. 


The set of all interior points of E is called the interior of E and is sometimes 
denoted int(£). The set of exterior points of E is called the exterior of E and is 
sometimes denoted ext(E). The set of boundary points of E is called the boundary 
of E and is sometimes denoted 0 E. 


Remark 1.2.6 If xo is an interior point of E, then xo must actually be an element of 
E, since balls B(x, r) always contain their center x9. Conversely, if xo is an exterior 
point of EF, then xp cannot be an element of EF. In particular it is not possible for xo 
to simultaneously be an interior and an exterior point of E. If xo is a boundary point 
of E, then it could be an element of E, but it could also not lie in E; we give some 
examples below. 


Example 1.2.7. We work on the real line R with the standard metric d. Let E be 
the half-open interval E = [1,2). The point 1.5 is an interior point of E, since 
one can find a ball (for instance B(1.5, 0.1)) centered at 1.5 which lies in E. The 
point 3 is an exterior point of EF, since one can find a ball (for instance B(3, 0.1)) 
centered at 3 which is disjoint from E. The points 1 and 2, however, are neither 
interior points nor exterior points of E and are thus boundary points of E. Thus in 
this case int(EZ) = (1, 2), ext(Z) = (—oo, 1) U (2, ov), and dE = {1, 2}. Note that 
in this case one of the boundary points is an element of E,, while the other is not. 


Example 1.2.8 When we give a set X the discrete metric dgis-, and E is any subset 
of X, then every element of F is an interior point of FE, every point not contained in 
E is an exterior point of E, and there are no boundary points; see Exercise 1.2.1. 


Definition 1.2.9 (Closure) Let (X, d) be a metric space, let E be a subset of X, and 
let x9 be a point in X. We say that x9 is an adherent point of E if for every radius 
r > 0, the ball B(xo, 7) has a non-empty intersection with E. The set of all adherent 
points of E is called the closure of E and is denoted E. 


Note that these notions are consistent with the corresponding notions on the real 
line defined in Definitions 9.1.8 and 9.1.10 (why?). 

The following proposition links the notions of adherent point with interior and 
boundary point and also to that of convergence. 
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Proposition 1.2.10 Let (X, d) be a metric space, let E be a subset of X, and let x 
be a point in X. Then the following statements are logically equivalent. 


(a) xo is an adherent point of E. 

(b) xo is either an interior point or a boundary point of E. 

(c) There exists a sequence (x,)°°, in E which converges to xq with respect to the 
metric d. 


Proof See Exercise 1.2.2. 


From the equivalence of Proposition 1.2.10(a) and (b) we obtain an immediate 
corollary: 


Corollary 1.2.11 Let (X,d) be a metric space, and let E be a subset of X. Then 
E = int(£) V0OE = X\ext(£). 


As remarked earlier, the boundary of a set E may or may not lie in E. Depending 
on how the boundary is situated, we may call a set open, closed, or neither: 


Definition 1.2.12 (Open and closed sets) Let (X, d) be a metric space, and let E 
be a subset of X. We say that EF is closed if it contains all of its boundary points, 
ie., OE C E. We say that E is open if it contains none of its boundary points, i.e., 
dE E = 9%. If E contains some of its boundary points but not others, then it is 
neither open nor closed. 


Example 1.2.13 We work in the real line R with the standard metric d. The set (1, 2) 
does not contain either of its boundary points 1, 2 and is hence open. The set [1, 2] 
contains both of its boundary points 1, 2 and is hence closed. The set [1, 2) contains 
one of its boundary points 1, but does not contain the other boundary point 2, so is 
neither open nor closed. 


Remark 1.2.14 It is possible for a set to be simultaneously open and closed, if it 
has no boundary. For instance, in a metric space (X, d), the whole space X has no 
boundary (every point in X is an interior point—why?), and so X is both open and 
closed. The empty set J also has no boundary (every point in X is an exterior point— 
why?), and so J is both open and closed. In many cases these are the only sets that 
are simultaneously open and closed, but there are exceptions. For instance, using the 
discrete metric dgis:, every set is both open and closed! (why?) 


From the above two remarks we see that the notions of being open and being 
closed are not negations of each other; there are sets that are both open and closed, 
and there are sets which are neither open nor closed. Thus, if one knew for instance 
that E was not an open set, it would be erroneous to conclude from this that E was 
a closed set, and similarly with the rdles of open and closed reversed. The correct 
relationship between open and closed sets is given by Proposition 1.2.15(e) below. 

Now we list some more properties of open and closed sets. 


Proposition 1.2.15 (Basic properties of open and closed sets) Let (X, d) be a metric 
space. 
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(a) Let E be a subset of X. Then E is open if and only if E = int(E). In other 
words, E is open if and only if for every x € E, there exists anr > 0 such that 
Bix,r) CE. 

(b) Let E be asubset of X. Then E is closed if and only if E contains all its adherent 
points. In other words, E is closed if and only if for every convergent sequence 
(Xn) pom in E, the limit limy oo Xn of that sequence also lies in E. 

(c) For any xo € X andr > 0, then the ball B(xo,r) is an open set. The set {x € 
X : d(x,Xo) < r} is aclosed set. (This set is sometimes called the closed ball of 
radius r centered at Xo.) 

(d) Any singleton set {xo}, where xo € X, is automatically closed. 

(e) If E is a subset of X, then E is open if and only if the complement X\ E := {x € 
X :x ¢ E} is closed. 

(f) If E,,..., E, is a finite collection of open sets in X, then Ey E,N--- E, is 
also open. If F\,..., Fy, is a finite collection of closed sets in X, then F, U Fy U 
---U F, ts also closed. 

(g) If {Euhaer is a collection of open sets in X (where the index set I could 
be finite, countable, or uncountable), then the union Egi={xEX:ixe 
Eq for some a € T} is also open. If {Fu}qwer is a collection of closed sets in X, 
then the intersection (leas Fy:={x € X:x € Fy foralla € 1} is also closed. 

(h) If E is any subset of X, then int(E) is the largest open set which is contained in 
E; in other words, int(E) is open, and given any other open set V © E, we have 
V Cint(E£). Similarly E is the smallest closed set which contains E; in other 
words, E is closed, and given any other closed set K > E, K > E. 


Proof See Exercise 1.2.3. 


— Exercises — 
Exercise 1.2.1 Verify the claims in Example 1.2.8. 


Exercise 1.2.2 Prove Proposition 1.2.10. (Hint: for some of the implications one 
will need the axiom of choice, as in Lemma 8.4.5.) 


Exercise 1.2.3 Prove Proposition 1.2.15. (Hint: you can use earlier parts of the 
proposition to prove later ones.) 


Exercise 1.2.4 Let (X, d) be a metric space, xo be a point in X, andr > 0. Let B be 
the open ball B:= B(xo,r) = {x € X : d(x, x9) <r}, and let C be the closed ball 
C:={x € X : d(x, xo) <r}. 


(a) Show that B C C. 
(b) Give > an example of a metric space (X, d), a point xo, and a radius r > O such 
that B is not equal to C. 
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1.3. Relative Topology 


When we defined notions such as open and closed sets, we mentioned that such 
concepts depended on the choice of metric one uses. For instance, on the real line R, 
if one uses the usual metric d(x, y) = |x — y|, then the set {1} is not open, however 
if instead one uses the discrete metric disc, then {1} is now an open set (why?). 
However, it is not just the choice of metric which determines what is open and 
what is not—it is also the choice of ambient space X. Here are some examples. 


Example 1.3.1 Consider the plane R? with the Euclidean metric d;. Inside the plane, 
we can find the x-axis X := {(x, 0) : x € R}. The metric dp can be restricted to X, 
creating a subspace (X, dz|x xx) of (R?, dy). (This subspace is essentially the same 
as the real line (R, d) with the usual metric; the precise way of stating this is that 
(X, dz|xxx) is isometric to (R, d). We will not pursue this concept further in this 
text, however.) Now consider the set 


E:={(x,0):-Il <x <1} 


which is both a subset of X and of R?. Viewed as a subset of R?, it is not open, 
because the point (0, 0), for instance, lies in F but is not an interior point of E. (Any 
ball BR? dp (0, 7) will contain at least one point that lies outside of the x-axis, and 
hence outside of EZ.) On the other hand, if viewed as a subset of X, it is open; every 
point of EF is an interior point of E with respect to the metric space (X, dp2|xx.x). For 
instance, the point (0, 0) is now an interior point of FE’, because the ball B X,dp2 xxx (0, 1) 
is contained in E (in fact, in this case it is E). 


Example 1.3.2 Consider the real line R with the standard metric d, and let X be 
the interval X := (—1, 1) contained inside R; we can then restrict the metric d to X, 
creating a subspace (X, d|xxx) of (R, d). Now consider the set [0, 1). This set is not 
closed in R, because the point | is adherent to [0, 1) but is not contained in [0, 1). 
However, when considered as a subset of X, the set [0, 1) now becomes closed; the 
point 1 is not an element of X and so is no longer considered an adherent point of 
[0, 1), and so now [0, 1) contains all of its adherent points. 


To clarify this distinction, we make a definition. 


Definition 1.3.3 (Relative topology) Let (X, d) be a metric space, let Y be a subset 
of X, and let E be a subset of Y. We say that F is relatively open with respect to Y 
if it is open in the metric subspace (Y, d|y,y). Similarly, we say that E is relatively 
closed with respect to Y if it is closed in the metric space (Y, dlyxy). 


The relationship between open (or closed) sets in X, and relatively open (or 
relatively closed) sets in Y, is the following. 


Proposition 1.3.4 Let (X,d) be a metric space, let Y be a subset of X, and let E 
be a subset of Y. 
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(a) E is relatively open with respect to Y if and only if E= VY for some set 
V C X which is open in X. 

(b) E is relatively closed with respect to Y if and only if E = K (1. Y for some set 
K C X which is closed in X. 


Proof We just prove (a) and leave (b) to Exercise 1.3.1. First suppose that E' is 
relatively open with respect to Y. Then, EF is open in the metric space (Y, dlyxy). 
Thus, for every x € E, there exists a radius r > 0 such that the ball Byyaj,,,)(x, 1) 
is contained in FE’. This radius r depends on x; to emphasize this we write r, instead 
of r, thus for every x € E the ball Byy,ajy,.,)(%, rx) 18 contained in FE. (Note that we 
have used the axiom of choice, Proposition 8.4.7, to do this.) 
Now consider the set 
Vi= U Bix,a)(X, Peds 


xeE 


This is a subset of X. By Proposition 1.2.15(c) and (g), V is open. Now we prove 
that E = VY. Certainly any point x in E lies in VM Y, since it lies in Y and it 
also lies in Byxy,a)(x, rx), and hence in V. Now suppose that y is a point in VN Y. 
Then y € V, which implies that there exists an x € E such that y € Byx.a)(x, rx). 
But since y is also in Y, this implies that y € By,ajy,y)(%, rx). But by definition of 
r,, this means that y € E, as desired. Thus we have found an open set V for which 
E=VNY as desired. 

Now we do the converse. Suppose that E = V ™ Y for some open set V; we have 
to show that E is relatively open with respect to Y. Let x be any point in E; we 
have to show that x is an interior point of E in the metric space (Y, dlyxy). Since 
x € E, we know x € V. Since V is open in X, we know that there is a radius r > 0 
such that Bcx,a)(x, r) is contained in V. Strictly speaking, r depends on x, and so we 
could write r, instead of r, but for this argument we will only use a single choice of 
x (as opposed to the argument in the previous paragraph) and so we will not bother 
to subscript r here. Since E = V 1 Y, this means that Bcx,q)(x, r) M Y is contained 
in E. But Bey,a(x,r) NY is exactly the same as Byq\,,,)(%, 7) (why?), and so 
Bvy,dly,y)(*, 7) 18 contained in E. Thus x is an interior point of EF in the metric space 
(Y, dlyxy), as desired. 


— Exercises — 


Exercise 1.3.1 Prove Proposition 1.3.4(b). 


1.4 Cauchy Sequences and Complete Metric Spaces 


We now generalize much of the theory of limits of sequences from Chap. 6 to the 
setting of general metric spaces. We begin by generalizing the notion of a subsequence 
from Definition 6.6.1: 
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Definition 1.4.1 (Subsequences) Suppose that (x‘"))°.,,, is a sequence of points ina 
metric space (X, d). Suppose that n;, n2, n3, .. . is an increasing sequence of integers 
which are at least as large as m, thus 


m<ny <ng2<1N3<":::. 


Then we call the sequence ey, a subsequence of the original sequence 
(ry ie 
n=m 


o.e) 
Example 1.4.2. The sequence ((+: t)) in R? is a subsequence of the sequence 
j=l 
(2, 7) a (in this case, nj := 7); The sequence 1, 1, 1, 1, ... is a subsequence of 
1,0,1,0,1,.... 


If a sequence converges, then so do all of its subsequences: 


Lemma 1.4.3 Let (x ce ie be a sequence in (X, d) which converges to some limit 
xo. Then every subsequence (xi Ne of that sequence also converges to Xo. 


Proof See Exercise 1.4.1. 


On the other hand, it is possible for a subsequence to be convergent without the 
sequence as a whole being convergent. For example, the sequence 1, 0, 1,0, 1, ... is 
not convergent, even though certain subsequences of it (suchas 1, 1, 1, .. .) converge. 
To quantify this phenomenon, we generalize Definition 6.4.1 as follows: 


Definition 1.4.4 (Limit points) Suppose that (x‘"))°.,,, is a sequence of points in a 
metric space (X, d), and let L € X. We say that L is a limit point of (x) iff for 


n=m 


every N > mand e > 0 there exists ann > N such that d(x™, L) < «. 


Proposition 1.4.5 Let (x\))°_,, be a sequence of points in a metric space (X, a), 


na=m 


and let L € X. Then the following are equivalent: 


e L is a limit point of (x), 
e There exists a subsequence Greys of the original sequence (x), which 
converges to L. 


Proof See Exercise 1.4.2. 


Next, we review the notion of a Cauchy sequence from Definition 6.1.3 (see also 
Definition 5.1.8). 


Definition 1.4.6 (Cauchy sequences) Let (x\)%,,, be a sequence of points in a 
metric space (X, d). We say that this sequence is a Cauchy sequence iff for every 
€ > 0, there exists an N > m such that d(x, x) < ¢ forall j,k > N. 


Lemma 1.4.7 (Convergent sequences are Cauchy sequences) Let (x\)%,, be a 
sequence in (X, d) which converges to some limit xo. Then (x cp ol is also a Cauchy 
sequence. 
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Proof See Exercise 1.4.3. 


It is also easy to check that subsequence of a Cauchy sequence is also a Cauchy 
sequence (why?). However, not every Cauchy sequence converges: 


Example 1.4.8 (Informal) Consider the sequence 
3, 3.1, 3.14, 3.141, 3.1415,... 


in the metric space (Q, d) (the rationals Q with the usual metric d(x, y):=|x — y)). 
While this sequence is convergent in R (it converges to zr), it does not converge in 
Q (since z ¢ Q, and a sequence cannot converge to two different limits). 


So in certain metric spaces, Cauchy sequences do not necessarily converge. How- 
ever, if even part of a Cauchy sequence converges, then the entire Cauchy sequence 
must converge (to the same limit): 


Lemma 1.4.9 Let (x)°_, be a Cauchy sequence in (X, d). Suppose that there is 
some subsequence (xi 2 a of this sequence which converges to a limit xo in X. 


Then the original sequence (x"))°_,, also converges to Xo. 


Proof See Exercise 1.4.4. 


In Example 1.4.8 we saw an example of a metric space which contained Cauchy 
sequences which did not converge. However, in Theorem 6.4.18 we saw that in the 
metric space (R, d), every Cauchy sequence did have a limit. This motivates the 
following definition. 


Definition 1.4.10 (Complete metric spaces) A metric space (X, d) is said to be 
complete iff every Cauchy sequence in (X, d) is in fact convergent in (X, d). 


Example 1.4.11 By Theorem 6.4.18, the reals (R, d) are complete; by Example 
1.4.8, the rationals (Q, d), on the other hand, are not complete. 


Complete metric spaces have some nice properties. For instance, they are intrin- 
sically closed: no matter what space one places them in, they are always closed sets. 
More precisely: 


Proposition 1.4.12 (a) Let (X,d) be a metric space, and let (Y, d\yyxy) be a sub- 
space of (X, d). If (Y, dlyxy) is complete, then Y must be closed in X. 

(b) Conversely, suppose that (X, d) is a complete metric space, and Y is a closed 
subset of X. Then the subspace (Y, d|yxy) is also complete. 


Proof See Exercise 1.4.7. 


In contrast, an incomplete metric space such as (Q, d) may be considered closed 
in some spaces (for instance, Q is closed in Q) but not in others (for instance, Q is 
not closed in R). Indeed, it turns out that given any incomplete metric space (X, d), 


1.4 Cauchy Sequences and Complete Metric Spaces 17 


there exists a completion (X, d), which is a larger metric space containing (X, d) 
which is complete, and such that X is not closed in X (indeed, the closure of X in 
(X, d) will be all of X); see Exercise 1.4.8. For instance, one possible completion of 
QisR. 


— Exercises — 
Exercise 1.4.1 Prove Lemma 1.4.3. (Hint: review your proof of Proposition 6.6.5.) 


Exercise 1.4.2 Prove Proposition 1.4.5. (Hint: review your proof of Proposition 
6.6.6.) 


Exercise 1.4.3. Prove Lemma 1.4.7. (Hint: review your proof of Proposition 6.1.12.) 
Exercise 1.4.4 Prove Lemma 1.4.9. 


Exercise 1.4.5 Let (x), be a sequence of points in a metric space (X, d), and 
let L € X. Show that if L is a limit point of the sequence (x))°, then L is an 


n=m? 
adherent point of the set {x : n > m}. Is the converse true? 
Exercise 1.4.6 Show that every Cauchy sequence can have at most one limit point. 
Exercise 1.4.7 Prove Proposition 1.4.12. 


Exercise 1.4.8 The following construction generalizes the construction of the reals 
from the rationals in Chap.5, allowing one to view any metric space as a subspace 
of a complete metric space. In what follows we let (X, d) be a metric space. 


(a) Given any Cauchy sequence (x,)°°, in X, we introduce the formal limit 
LIM),+00 Xn. We say that two formal limits LIM)... x, and LIMn-+.0 yn are 
equal if limy_.0 d(Xn, Yn) 1s equal to zero. Show that this equality relation obeys 
the reflexive, symmetry, and transitive axioms. 

Let X be the space of all formal limits of Cauchy sequences in X, with the above 
equality relation. Define a metric dg: X x X — [0, +00) by setting 


(b 


wm 


dx(LIMn-00 Xn» LIMn-+00 Yn) := lim d(Xn, Yn). 
noo 


Show that this function is well-defined (this means not only that the limit 
limMy—+oo d(%, Yn) exists, but also that the axiom of substitution is obeyed; cf. 
Lemma 5.3.7) and gives X the structure of a metric space. 

(c) Show that the metric space (X, dz) is complete. 

(d) We identify an element x € X with the corresponding formal limit LIMy-,.. x 
in X; show that this is legitimate by verifying that x = y — > LIM, ,..x = 
LIM;—oo y. With this identification, show that d(x, y) = dy(x, y), and thus 
(X, d) can now be thought of as a subspace of (X, dy). 

(e) Show that the closure of X in X is X (which explains the choice of notation X). 

(f) Show that the formal limit agrees with the actual limit, thus if (x,)°°, is any 
Cauchy sequence in X, then we have lim, _,.9 X, = LIMy—00 Xy in X. 
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1.5 Compact Metric Spaces 


We now come to one of the most useful notions in point-set topology, that of com- 
pactness. Recall the Heine—Borel theorem (Theorem 9.1.24), which asserted that 
every sequence in a closed and bounded subset X of the real line R had a convergent 
subsequence whose limit was also in X. Conversely, only the closed and bounded 
sets have this property. This property turns out to be so useful that we give it a name. 


Definition 1.5.1 (Compactness) A metric space (X,d) is said to be compact iff 
every sequence in (X, d) has at least one convergent subsequence. A subset Y of a 
metric space X is said to be compact if the subspace (Y, d|y,y) is compact. 


Remark 1.5.2. The notion of a set Y being compact is intrinsic, in the sense that it 
only depends on the metric function d|y,y restricted to Y, and not on the choice 
of the ambient space X. The notions of completeness in Definition 1.4.10, and of 
boundedness below in Definition 1.5.3, are also intrinsic, but the notions of open and 
closed are not (see the discussion in Sect. 1.3). 


Thus, Theorem 9.1.24 shows that in the real line R with the usual metric, every 
closed and bounded set is compact, and conversely every compact set is closed and 
bounded. 

Now we investigate how the Heine—Borel extends to other metric spaces. 


Definition 1.5.3 (Bounded sets) Let (X, d) be a metric space, and let Y be a subset 
of X. We say that Y is bounded iff for every x € X there exists a ball B(x, r) in X 
of some finite radius r which contains Y. We call the metric space (X, d) bounded 
if X is bounded. 


Remark 1.5.4 This definition is compatible with the definition of a bounded set in 
Definition 9.1.22 (Exercise 1.5.1). 


Proposition 1.5.5 Let (X, d) be a compact metric space. Then (X, d) is both com- 
plete and bounded. 


Proof See Exercise 1.5.2. 


From this proposition and Proposition 1.4.12(a) we obtain one half of the Heine— 
Borel theorem for general metric spaces: 


Corollary 1.5.6 (Compact sets are closed and bounded) Let (X,d) be a metric 
space, and let Y be a compact subset of X. Then Y is closed and bounded. 


The other half of the Heine—Borel theorem is true in Euclidean spaces: 


Theorem 1.5.7 (Heine—Borel theorem) Let (R", d) be a Euclidean space with either 
the Euclidean metric, the taxicab metric, or the sup norm metric. Let E be a subset 
of R". Then E is compact if and only if it is closed and bounded. 
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Proof See Exercise 1.5.3. 


However, the Heine—Borel theorem is not true for more general metrics. For 
instance, the integer Z with the discrete metric is closed (indeed, it is complete) 
and bounded, but not compact, since the sequence 1, 2,3,4,... is in Z but has no 
convergent subsequence (why?). Another example is in Exercise 1.5.8. However, a 
version of the Heine—Borel theorem is available if one is willing to replace closedness 
with the stronger notion of completeness and boundedness with the stronger notion 
of total boundedness; see Exercise 1.5.10. 

One can characterize compactness topologically via the following, rather strange- 
sounding statement: every open cover of a compact set has a finite subcover. 


Theorem 1.5.8 Let (X, d) be a metric space, and let Y be a compact subset of X. 
Let (Va)wea be a collection of open sets in X, and suppose that 


Fe| |%y 


acA 


(i.e., the collection (Va)wea covers Y). Then there exists a finite subset F of A such 


that 
¥C U Vy. 


ack 


Proof We assume for sake of contradiction that there does not exist any finite subset 
F of A for which Y C Uper Va- 

Let y be any element of Y. Then y must lie in at least one of the sets V,,. Since 
each V, is open, there must therefore be an r > 0 such that Bcx,a)(y,r) © Va. Now 
let r(y) denote the quantity 


r(y):= sup{r € (0, 00) : Byx,a)(y,r) © Va for some a € A}. 


By the above discussion, we know that r(y) > 0 for all y € Y. Now, let ro denote 
the quantity 
ro:= inf{r(y): y € Y}. 


Since r(y) > 0 for all y € Y, we have rp > 0. There are three cases: ro = 0, 0 < 
To < ©, and rp = ov. 


e Case 1: ro = 0. Then for every integer n > 1, there is at least one point y in Y 
such that r(y) < 1/n (why?). We thus choose, for each n > 1, a point y™ in 
Y such that r(y™) < 1/n (we can do this because of the axiom of choice, see 
Proposition 8.4.7). In particular we have limy_... r(y) = 0, by the squeeze test. 
The sequence (y))®, is a sequence in Y; since Y is compact, we can thus find a 
subsequence (y""/ es which converges to a point yo € Y. 

As before, we know that there exists some a € A such that yo € Vy, and hence 

(since V, is open) there exists some ¢ > 0 such that B(yo, €) C Vy. Since y? 
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converges to yo, there must exist an N > 1 such that yon € B(yo, €/2) foralln > 
N. In particular, by the triangle inequality we have B(y""), ¢/2) C B(yo, €), and 
thus B(y"), ¢/2) C Vy. By definition of r(y»), this implies that r(y“) > e/2 
for all n > N. But this contradicts the fact that lim,_,.. r(y™) = 0. 

e Case 2: 0 < rg < oo. In this case we now have r(y) > ro/2 for all y € Y. This 

implies that for every y € Y there exists an a € A such that B(y, 79/2) C Va 
(why?). 
We now construct a sequence y"), y?), ... by the following recursive procedure. 
We let y“ be any point in Y. The ball B(y“”, r9/2) is contained in one of the 
V, and thus cannot cover all of Y, since we would then obtain a finite cover, a 
contradiction. Thus there exists a point y* which does not lie in B(y, ro/2), so 
in particular d(y®, y) > ro/2. Choose such a point y. The set B(y", 79/2) U 
B(y™, ro /2) cannot cover all of Y, since we would then obtain two sets V,, and Vy, 
which covered Y, a contradiction again. So we can choose a point y°) which does 
not lie in B(y™, r9/2) U B(y™, ro/2), so in particular d(y®, y) > ry/2 and 
d(y®, y©) > ro/2. Continuing in this fashion we obtain a sequence (y)°, in Y 
with the property that d(y, y) > ro/2 for all k > j. In particular the sequence 
(y)°, is not a Cauchy sequence, and in fact no subsequence of (y)°° , can be 
a Cauchy sequence either. But this contradicts the assumption that Y is compact 
(by Lemma 1.4.7). 

e Case 3: 79 = oo. For this case we argue as in Case 2, but replacing the role of rp /2 
by (say) 1. 


It turns out that Theorem 1.5.8 has a converse: if Y has the property that every 
open cover has a finite subcover, then it is compact (Exercise 1.5.11). In fact, this 
property is often considered the more fundamental notion of compactness than the 
sequence-based one. (For metric spaces, the two notions, that of compactness and 
sequential compactness, are equivalent, but for more general topological spaces, the 
two notions are slightly different, though we will not show this here.) 

Theorem 1.5.8 has an important corollary: that every nested sequence of non- 
empty compact sets is still non-empty. 


Corollary 1.5.9 Let (X, d) be ametric space, and let K,, Kx, K3, ... be a sequence 
of non-empty compact subsets of X such that 


Ki 2K, 2 K32---. 


Then the intersection (‘een K,, is non-empty. 


Proof See Exercise 1.5.6. 


We close this section by listing some miscellaneous properties of compact sets. 


Theorem 1.5.10 Let (X, d) be a metric space. 
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(a) If Y isacompact subset of X, and Z C Y, then Z is compact if and only if Z is 
closed. 

(b) If Y\,...,Y¥, are a finite collection of compact subsets of X, then their union 
Y, U...UY, is also compact. 

(c) Every finite subset of X (including the empty set) is compact. 


Proof See Exercise 1.5.7. 


— Exercises — 


Exercise 1.5.1 Show that Definitions 9.1.22 and 1.5.3 match when talking about 
subsets of the real line with the standard metric. 


Exercise 1.5.2 Prove Proposition 1.5.5. (Hint: prove the completeness and bound- 
edness separately. For both claims, use proof by contradiction. You will need the 
axiom of choice, as in Lemma 8.4.5.) 


Exercise 1.5.3. Prove Theorem 1.5.7. (Hint: use Proposition 1.1.18 and Theorem 
9.1.24.) 


Exercise 1.5.4 Let (R, d) be the real line with the standard metric. Give an example 
of a continuous function f: R — R, and an open set V C R, such that the image 
F(V) :={f() : x € V} of V is not open. 


Exercise 1.5.5 Let (R, d) be the real line with the standard metric. Give an example 
of a continuous function f: R — R, and a closed set F C R, such that f(F) is not 
closed. 


Exercise 1.5.6 Prove Corollary 1.5.9. (Hint: work in the compact metric space 
(Ki, d|x,xx,), and consider the sets V, := K,\Kn, which are open on K;. Assume 
for sake of contradiction that ()--_, Kn =, and then apply Theorem 1.5.8.) 


Exercise 1.5.7 Prove Theorem 1.5.10. (Hint: for part (c), you may wish to use (b), 
and first prove that every singleton set is compact.) 


Exercise 1.5.8 Let (X, dj) be the metric space from Exercise 1.1.15. For each nat- 
ural number n, let e” = (ef )$29 be the sequence in X such that ew := 1 when 


n= j and en :=0 when n # j. Show that the set {e : n € N} is a closed and 


bounded subset of X, but is not compact. (This is despite the fact that (X, dj) is even 
a complete metric space—a fact which we will not prove here. The problem is that 
not that X is incomplete, but rather that it is “infinite-dimensional’, in a sense that 
we will not discuss here.) 


Exercise 1.5.9 Show that a metric space (X,d) is compact if and only if every 
sequence in X has at least one limit point. 


Exercise 1.5.10 A metric space (X, d) is called totally bounded if for every ¢ > 0, 
there exists a natural number n and a finite number of balls B(x, ¢),..., Bx™, €) 
which cover X (i.e., X = Ui_, Bx, €). 


22 1 Metric Spaces 


(a) Show that every totally bounded space is bounded. 

(b) Show the following stronger version of Proposition 1.5.5: if (X, d) is compact, 
then complete and totally bounded. (Hint: if X is not totally bounded, then there 
is some € > 0 such that X cannot be covered by finitely many e-balls. Then 
use Exercise 8.5.20 to find an infinite sequence of balls B(x, ¢/2) which are 
disjoint from each other. Use this to then construct a sequence which has no 
convergent subsequence.) 

(c) Conversely, show that if X is complete and totally bounded, then X is com- 
pact. (Hint: if (x), is a sequence in X, use the total boundedness hypothe- 
sis to recursively construct a sequence of subsequences (x "'/))° , of (x™)%, 
for each positive integer j, such that for each j, the elements of the sequence 
(x/))°° | are contained ina single ball of radius 1/7, and also that each sequence 
(x@J+)D)°° | is a subsequence of the previous one (x*/))°,. Then show that 
the “diagonal” sequence (x‘"*”)°, is a Cauchy sequence, and then use the 
completeness hypothesis.) 


Exercise 1.5.11 Let (X, d) have the property that every open cover of X has a finite 
subcover. Show that X is compact. (Hint: if X is not compact, then by Exercise 1.5.9, 
there is a sequence (x))°° | with no limit points. Then for every x € X there exists 
a ball B(x, €) containing x which contains at most finitely many elements of this 
sequence. Now use the hypothesis.) 


Exercise 1.5.12 Let (X, duisc) be a metric space with the discrete metric dyisc. 


(a) Show that X is always complete. 

(b) When is X compact, and when is X not compact? Prove your claim. (Hint: the 
Heine—Borel theorem will be useless here since that only applies to Euclidean 
spaces.) 


Exercise 1.5.13 Let E and F be two compact subsets of R (with the standard metric 
d(x, y) = |x — y|). Show that the Cartesian product E x F:={(x,y):x€ E,ye 
F} is a compact subset of R? (with the Euclidean metric dj). 


Exercise 1.5.14 Let (X, d) be a metric space, let E be a non-empty compact subset 
of X, and let xo be a point in X. Show that there exists a point x € E such that 


d(xo, x) = inf{d(xo, y): y € E}, 


i.e., x is the closest point in E to xo. (Hint: let R be the quantity R := inf{d(xo, y) : 
y € E}. Construct a sequence (x)® , in E such that d(xo, x”) < R + +, and then 
use the compactness of EF.) 


Exercise 1.5.15 Let (X,d) be a compact metric space. Suppose that (Ky)ye; is a 
collection of closed sets in X with the property that any finite subcollection of these 
sets necessarily has non-empty intersection, thus (),.- Ka # 9 for all finite F C /. 
(This property is known as the finite intersection property.) Show that the entire 
collection has non-empty intersection, thus (),<; Ke 4 %. Show by counterexample 
that this statement fails if X is not compact. 


Chapter 2 M®) 
Continuous Functions on Metric Spaces cre 


2.1 Continuous Functions 


In the previous chapter we studied a single metric space (X, d), and the various types 
of sets one could find in that space. While this is already quite a rich subject, the 
theory of metric spaces becomes even richer, and of more importance to analysis, 
when one considers not just a single metric space, but rather pairs (X, dx) and (Y, dy) 
of metric spaces, as well as continuous functions f : X — Y between such spaces. 
To define this concept, we generalize Definition 9.4.1 as follows: 


Definition 2.1.1 (Continuous functions) Let (X,dx) be a metric space, and let 
(Y, dy) be another metric space, and let f: X — Y be a function. If x9 € X, we 
say that f is continuous at xo iff for every ¢ > O, there exists a 5 > O such that 
dy(f (x), f(xo)) < € whenever dy (x, x9) < 6. We say that f is continuous iff it is 
continuous at every point x € X. 


Remark 2.1.2 Continuous functions are also sometimes called continuous maps. 
Mathematically, there is no distinction between the two terminologies. 


Remark 2.1.3 If f: X — Y is continuous, and K is any subset of X, then the 
restriction f|x : K — Y of f to K is also continuous (why?). 


We now generalize much of the discussion in Chap. 9. We first observe that con- 
tinuous functions preserve convergence: 


Theorem 2.1.4 (Continuity preserves convergence) Suppose that (X,dx) and 

(Y, dy) are metric spaces. Let f: X — Y be a function, and let xy € X be a point 

in X. Then the following three statements are logically equivalent: 

(a) f is continuous at xo. 

(b) Whenever Gc, is a sequence in X which converges to x9 with respect to 
the metric dx, the sequence (f (x pe, converges to f (xo) with respect to the 
metric dy. 
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(c) For every open set V CY that contains f (xo), there exists an open set U C X 
containing xq such that f(U) C V. 


Proof See Exercise 2.1.1. Oo 
Another important characterization of continuous functions involves open sets. 


Theorem 2.1.5 Let (X, dx) be a metric space, and let (Y, dy) be another metric 

space. Let f : X — Y bea function. Then the following four statements are equiva- 

lent: 

(a) f is continuous. 

(b) Whenever ye, is a sequence in X which converges to some point xp € X 
with respect to the metric dx, the sequence (f (x 2) baa converges to f (xo) with 


respect to the metric dy. 
(c) Whenever V is an open set in Y, the set f~'(V) := {x € X: f(x) € V} is an 


open set in X. 
(d) Whenever F is a closed set in Y, the set f-'(F) := {x € X: f(x) € Flisa 


closed set in X. 
Proof See Exercise 2.1.2. oO 
Remark 2.1.6 It may seem strange that continuity ensures that the inverse image of 


an open set is open. One may guess instead that the reverse should be true, that the 
forward image of an open set is open; but this is not true; see Exercises 1.5.4, 1.5.5. 


As a quick corollary of the above two theorems we obtain 
Corollary 2.1.7 (Continuity preserved by composition) Let (X, dy), (Y, dy), and 
(Z, dz) be metric spaces. 


(a) If f: X — Y is continuous at a point xy € X, and g: Y — Z is continuous at 
Ff (Xo), then the composition go f: X — Z, defined by go f(x) := g(f(x)), 
is continuous at Xo. 

(b) If f : X — Y is continuous, and g: Y — Z is continuous, then go f: X > Z 
is also continuous. 


Proof See Exercise 2.1.3. oO 


Example 2.1.8 If f : X — Risacontinuous function, then the function f* : X > 
R defined by f?(x) := f(x)’ is automatically continuous also. This is because we 
have f? = go f, where g: R — R is the squaring function g(x) := x”, and gisa 
continuous function. 


— Exercises — 
Exercise 2.1.1 Prove Theorem 2.1.4. (Hint: review your proof of Proposition 9.4.7.) 


Exercise 2.1.2 Prove Theorem 2.1.5. (Hint: Theorem 2.1.4 already shows that (a) 
and (b) are equivalent.) 


Exercise 2.1.3. Use Theorem 2.1.4 and Theorem 2.1.5 to prove Corollary 2.1.7. 
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Exercise 2.1.4 Give an example of functions f: R — R and g: R > R such that 


(a) f is not continuous, but g and g o f are continuous. 
(b) gis not continuous, but f and go f are continuous. 
(c) f and g are not continuous, but g o f is continuous. 


Explain briefly why these examples do not contradict Corollary 2.1.7. 


Exercise 2.1.5 Let (X,d) be a metric space, and let (E, d|z.¢) be a subspace of 
(X, d). Let tg.» : E — X be the inclusion map, defined by setting tz_,x(x) := x 
for all x € E. Show that i¢_, y is continuous. 


Exercise 2.1.6 Let f: X — Y be a function from one metric space (X, dx) to 
another (Y, dy). Let E be a subset of X (which we give the induced metric dx|zx£), 
and let f|~ : E — Y be the restriction of f to E, thus f|g(x) := f(x) whenx € E. 
If x9 € E and f is continuous at xo, show that f|¢ is also continuous at xo. (Is the 
converse of this statement true? Explain.) Conclude that if f is continuous, then 
f lg is continuous. Thus restriction of the domain of a function does not destroy 
continuity. (Hint: use Exercise 2.1.5.) 


Exercise 2.1.7 Let f: X — Y be a function from one metric space (X,dx) to 
another (Y, dy). Suppose that the image f(X) of X is contained in some subset 
E CY of Y. Let g: X — E be the function which is the same as f but with the 
codomain restricted from Y to E, thus g(x) = f(x) for all x € X. We give E the 
metric dy|z.” induced from Y. Show that for any xo € X, that f is continuous at 
Xo if and only if g is continuous at x9. Conclude that f is continuous if and only 
if g is continuous. (Thus the notion of continuity is not affected if one restricts the 
codomain of the function.) 


2.2 Continuity and Product Spaces 


Given two functions f: X > Y and g: X — Z, one can define their pairing 
(f,g): X — Y x Z defined by (f, g)(x) := (f(*), g(x)), Le., this is the function 
taking values in the Cartesian product Y x Z whose first coordinate is f(x) and 
whose second coordinate is g(x) (cf. Exercise 3.5.7). For instance, if f: R- R 
is the function f(x) := x? +3, and g: R- R is the function g(x) = 4x, then 
(f, g): R > R? is the function (f, g)(x) := (x7 +3, 4x). The pairing operation 
preserves continuity: 


Lemma 2.2.1 Let f: X > Randg: X — Rbe functions, and let (f, g): X > R? 
be their direct sum. We give R? the Euclidean metric. 


(a) Ifxo € X, then f and g are both continuous at xo if and only if (f, g) is continuous 
at Xo. 
(b) f and g are both continuous if and only if (f, g) is continuous. 
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Proof See Exercise 2.2.1. Oo 
To use this, we first need another continuity result: 


Lemma 2.2.2 The addition function (x, y) +> x + y, the subtraction function (x, y) 
> x — y, the multiplication function (x, y) +> xy, the maximum function (x, y) 
max(x, y), and the minimum function (x, y) +» min(x, y) are all continuous func- 
tions from R* to R. The division function (x, y) +> x/y is a continuous function 
from R x (R\{0}) = {(, y) € R?:y 4 0} to R. For any real number c, the func- 
tion x +> cx is a continuous function from R to R. 


Proof See Exercise 2.2.2. Oo 
Combining these lemmas we obtain 


Corollary 2.2.3. Let (X,d) be a metric space, and let f: X > Randg: X >R 
be functions. Let c be a real number. 


(a) Ifxo € X and f and g are continuous at xo, then the functions f +g: X > R, 
f-ge: X oR, fg: X >R, max(f,g): X — R, min(f, g): X — R, and 
cf: X — R (see Definition 9.2.1 for definitions) are also continuous at xo. If 
g(x) 40 forall x € X, then f/g: X — Ris also continuous. 

If f and g are continuous, then the functions f + g: X > R, f—g: X —R, 
fg: X > R, max(f,g): X > R, min(f, g): X — R, and cf: X > Rare 
also continuous at xo. If g(x) #0 for all x € X, then f/g: X — R is also 
continuous at xo. 


(b 


Se 


Proof We first prove (a). Since f and g are continuous at xo, then by Lemma 2.2.1 
(f, g): X — R? is also continuous at xo. On the other hand, from Lemma 2.2.2 
the function (x, y) +» x + y is continuous at every point in R? and in particular is 
continuous at (f, g)(xo). If we then compose these two functions using Corollary 
2.1.7 we conclude that f + g: X — Ris continuous. A similar argument gives the 
continuity of f — g, fg, max(f, g), min(f, g), and cf. To prove the claim for f/g, 
we first use Exercise 2.1.7 to restrict the codomain of g from R to R\{0}, and then 
one can argue as before. The claim (b) follows immediately from (a). oO 


This corollary allows us to demonstrate the continuity of a large class of functions; 
we give some examples below. 


— Exercises — 


Exercise 2.2.1 Prove Lemma 2.2.1. (Hint: use Proposition 1.1.18 and Theorem 
2.1.4.) 


Exercise 2.2.2 Prove Lemma 2.2.2. (Hint: use Theorem 2.1.5 and limit laws (The- 
orem 6.1.19).) 


Exercise 2.2.3. Show that if f: X — R is a continuous function, so is the function 
|f| : X — R defined by | f|(x) := |f(x)]. 
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Exercise 2.2.4 Let 7: R? > R and m2: R? > R be the functions (x, y) := x 
and 2(x, y) := y (these two functions are sometimes called the coordinate functions 
on R’). Show that z, and z> are continuous. Conclude that if f:R- X is any 
continuous function into a metric space (X, d), then the functions g: R2 > X and 
22: R? > X defined by g1(x, y) := f(x) and go(x, y) := f(y) are also continuous. 


Exercise 2.2.5 Let n, m > 0 be integers. Suppose that for every 0 <i < nand0 < 
j <mwe have a real number c;;. Form the function P: R? — R defined by 


n m 


P(x, y= > So eazy’: 


i=0 j=0 


(Such a function is known as a polynomial of two variables; a typical example of such 
a polynomial is P(x, y) = x? + 2xy* — x? + 3y +6.) Show that P is continuous. 
(Hint: use Exercise 2.2.4 and Corollary 2.2.3.) Conclude that if f: X — R and 
g: X — Rare continuous functions, then the function P(f, g) : X — R defined by 
P(f, g)(x) := P(f(), g(x)) is also continuous. 


Exercise 2.2.6 Let R” and R” be Euclidean spaces. If f: X — R” andg: X — R" 
are continuous functions, show that (f, g): X > R*” is also continuous, where we 
have identified R” x R” with R™*” in the obvious manner. Is the converse statement 
true? 


Exercise 2.2.7 Let k > 1, let J be a finite subset of N*, and let c: J > R bea 
function. Form the function P: R* — R defined by 


P(x1,...,X) = > Cissy gE, veedy: 


(Such a function is known as a polynomial of k variables; a typical example of 
such a polynomial is P(x1, x2, x3) = 3xtxiaxs — XX +x,;+5.) Show that P is 
continuous. (Hint: use induction on k, Exercise 2.2.6, and either Exercise 2.2.5 or 
Lemma 2.2.2.) 


Exercise 2.2.8 Let (X, dy) and (Y, dy) be metric spaces. Define the metric dy, : 
(X x Y) x (X x Y) > [0, cv) by the formula 


dxxy((x, y), @', y’)) = dy(x, x’) + dy(y, y’/). 


Show that (X x Y, dy,y) is a metric space, and deduce an analogue of Proposition 
1.1.18 and Lemma 2.2.1. 


Exercise 2.2.9 Let f: R* > R bea function from R? to R. Let (xo, yo) be a point 


in R?. If f is continuous at (xo, yo), show that 


lim lim sup f(x, y) = lim lim sup f(x, y) = f (xo, yo) 
Y> YO x>x9 


xX>X0 y> yo 


28 2 Continuous Functions on Metric Spaces 


and 
lim liminf f(x, y) = ue oe St (x, y) = fo, Yo). 


X—>X90 Y> Yo 


(Recall that limsup, ,,, f(x) := inf, SUD gee er f(x) and liminf,_,,, 
f(x) := sup,.o inf},—x |< f(*).) In particular, we have 


lim lim f(x,y) = oo = f(, y) 


x—>x0 y> Yo 


whenever the limits on both sides exist. (Note that the limits do not necessarily exist in 
general; consider for instance the function f : R* — R such that f(x, y) = ysin + 
when xy £0 and f(x, y) = 0 otherwise.) Discuss the comparison between this 
result and Example 1.2.7. 


Exercise 2.2.10 Let f: R* — R be a continuous function. Show that for each x € 
R, the function y +» f(x, y) is continuous on R, and for each y € R, the function 
xt» f(x, y) is continuous on R. Thus a function f(x, y) which is jointly continuous 
in (x, y) is also continuous in each variable x, y separately. 


Exercise 2.2.11 Let f: R? — R be the function defined by f(x, y) := iy? When 
(x, y) € (0,0), and f(x, y) = 0 otherwise. Show that for each fixed x € R, the 
function y +> f(x, y) is continuous on R, and that for each fixed y € R, the function 
x t+ f(x, y) is continuous on R, but that the function f: R* — R is not continuous 
on R2. This shows that the converse to Exercise 2.2.10 fails; it is possible to be 
continuous in each variable separately without being jointly continuous. 


Exercise 2.2.12 Let f: R? — R be the function defined by f(x, y) := x7/y when 
y £0, and f(x, y):=0 when y = 0. Show that lim;.9 f(tx, ty) = f(0, 0) for 
every (x, y) € R’, but that f is not continuous at the origin. Thus being continuous 
on every line through the origin is not enough to guarantee continuity at the origin! 


2.3 Continuity and Compactness 


Continuous functions interact well with the concept of compact sets defined in Def- 
inition 1.5.1. 


Theorem 2.3.1 (Continuous maps preserve compactness) Let f: X — Y be acon- 
tinuous map from one metric space (X, dx) to another (Y, dy). Let K C X be any 
compact subset of X. Then the image f (K) := {f (x) : x € K} of K is also compact. 


Proof See Exercise 2.3.1. Oo 


This theorem has an important consequence. Recall from Definition 9.6.5 the 
notion of a function f: X — R attaining a maximum or minimum at a point. We 
may generalize Proposition 9.6.7 as follows: 
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Proposition 2.3.2 (Maximum principle) Let (X, d) be a compact metric space, and 
let f: X — R be a continuous function. Then f is bounded. Furthermore, if X is 
non-empty, then f attains its maximum at some point Xmqx, € X and also attains its 
minimum at some point Xin € X. 


Proof See Exercise 2.3.2. Oo 


Remark 2.3.3 As was already noted in Exercise 9.6.1, this principle can fail if X is 
not compact. This proposition should be compared with Lemma 9.6.3 and Proposition 
9.6.7. 


Another advantage of continuous functions on compact sets is that they are uni- 
formly continuous. We generalize Definition 9.9.2 as follows: 


Definition 2.3.4 (Uniform continuity) Let f: X — Y be a map from one metric 
space (X, dy) to another (Y, dy). We say that f is uniformly continuous if, for every 
é > 0, there exists a 6 > O such that dy( f(x), f(x’)) < € whenever x, x’ € X are 
such that dy (x, x’) <6. 


Every uniformly continuous function is continuous, but not conversely (Exercise 
2.3.3). But if the domain X is compact, then the two notions are equivalent: 


Theorem 2.3.5 Let (X, dx) and (Y, dy) be metric spaces, and suppose that (X, dx) 
is compact. If f : X — Y is function, then f is continuous if and only if it is uniformly 
continuous. 


Proof If f is uniformly continuous then it is also continuous by Exercise 2.3.3. 
Now suppose that f is continuous. Fix ¢ > 0. For every xp € X, the function f 
is continuous at x9. Thus there exists a d(xo) > 0, depending on x9, such that 
dy(f (x), f(xo0)) < €/2 whenever dx (x, x9) < d(x). In particular, by the triangle 
inequality this implies that dy (f(x), f(x')) < e whenever x € Bry ay) (%0, 5(%0)/2) 
and dx (x’, x) < 6(xo)/2 (why?). 

Now consider the (possibly infinite) collection of balls 


{ Bex,dy) (%0, 5(%0)/2) : x9 € X}. 


Each ball in this collection is of course open, and the union of all these balls covers 
X, since each point x9 in X is contained in its own ball Bcx,a,) (xo, 5(xo)/2). Hence, 
by Theorem 1.5.8, there exist a finite number of points x), ..., x, such that the balls 
Bex,dy) (%j, 5(x;)/2) for j = 1,...,n cover X: 


XC U Bix, dy) (xj, 5(%;)/2). 
ja) 


Now let 6 := minj_; 5(x;)/2. Since each of the 5(x;) is positive, and there are only 
a finite number of j, we see that 5 > 0. Now let x, x’ be any two points in X such that 
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dx (x, x’) < 6. Since the balls Byx,ay)(x;, 6(x;)/2) cover X, we see that there must 
exist 1 < j <n such that x € Bcx,ay)(x;, 6(x;)/2). Since dx (x, x’) < 5, we have 
dx (x, x") < 6(x;)/2, and so by the previous discussion we have dy (f(x), f(x’) < 
é. We have thus found a 6 such that dy(f(x), f(x’)) < € whenever d(x, x’) < 6, 
and this proves uniform continuity as desired. Oo 


— Exercises — 
Exercise 2.3.1 Prove Theorem 2.3.1. 


Exercise 2.3.2 Prove Proposition 2.3.2. (Hint: modify the proof of Proposition 
9.6.7.) 


Exercise 2.3.3. Show that every uniformly continuous function is continuous, but 
give an example that shows that not every continuous function is uniformly contin- 
uous. 


Exercise 2.3.4 Let (X, dx), (Y, dy), (Z, dz) be metric spaces, and let f: X > Y 
and g: Y + Z be two uniformly continuous functions. Show that g o f: X > Zis 
also uniformly continuous. 


Exercise 2.3.5 Let (X, dx) be a metric space, and let f: X > Rand g: X ~R 
be uniformly continuous functions. Show that the pairing (f, g): X — R? defined 
by (f, g)(x) := (f(x), g(x)) is uniformly continuous. 


Exercise 2.3.6 Show that the addition function (x, y) + x + y and the subtraction 
function (x, y) + x — y are uniformly continuous from R? to R, but the multipli- 
cation function (x, y) + xy is not. Conclude that if f: X > R and g: X ~R 
are uniformly continuous functions on a metric space (X,d), then f+ 9: X —R 
and f — g: X — R are also uniformly continuous. Give an example to show that 
fg: X — Rneed not be uniformly continuous. What is the situation for max(/, g), 
min(f, g), f/g, and cf for a real number c? 


2.4 Continuity and Connectedness 


We now describe another important concept in metric spaces, that of connectedness. 


Definition 2.4.1 (Connected spaces) Let (X, d) be a metric space. We say that X 
is disconnected iff there exist disjoint non-empty open sets V and W in X such that 
V UW = X. (Equivalently, X is disconnected if and only if X contains a non-empty 
proper subset which is simultaneously closed and open.) We say that X is connected 
iff it is non-empty and not disconnected. 


We declare the empty set J as being special—it is neither connected nor discon- 
nected; one could think of the empty set as “unconnected”. 
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Example 2.4.2 Consider the set X := [1, 2] U [3, 4], with the usual metric. This set 
is disconnected because the sets [1, 2] and [3, 4] are open relative to X (why?). 


Intuitively, a disconnected set is one which can be separated into two disjoint open 
sets; a connected set is one which cannot be separated in this manner. We defined 
what it means for a metric space to be connected; we can also define what it means 
for a set to be connected. 


Definition 2.4.3 (Connected sets) Let (X, d) be a metric space, and let Y be a subset 
of X. We say that Y is connected iff the metric space (Y, d|yxy) is connected, and 
we say that Y is disconnected iff the metric space (Y, dlyxy) is disconnected. 


Remark 2.4.4 This definition is intrinsic; whether a set Y is connected or not depends 
only on what the metric is doing on Y, but not on what ambient space X one placing 
Y in. 


On the real line, connected sets are easy to describe. 


Theorem 2.4.5 Let X be a non-empty subset of the real line R. Then the following 
statements are equivalent. 


(a) X is connected. 
(b) Whenever x, y € X and x < y, the interval [x, y] is also contained in X. 
(c) X is an interval (in the sense of Definition 9.1.1). 


Proof First we show that (a) implies (b). Suppose that X is connected, and suppose 
for sake of contradiction that we could find points x < y in X such that [x, y] is not 
contained in X. Then there exists a real number x < z < y such that z ¢ X. Thus 
the sets (—oo, z)M X and (z, co) M X will cover X. But these sets are non-empty 
(because they contain x and y, respectively) and are open relative to X, and so X is 
disconnected, a contradiction. 

Now we show that (b) implies (a). Let X be a set obeying the property (b). Suppose 
for sake of contradiction that X is disconnected. Then there exist disjoint non-empty 
sets V, W which are open relative to X, such that VU W = X. Since V and W are 
non-empty, we may choose an x € V and y € W. Since V and W are disjoint, we 
have x ¢ y; without loss of generality we may assume x < y. By property (b), we 
know that the entire interval [x, y] is contained in X. 

Now consider the set [x, y] 0 V. This set is both bounded and non-empty (because 
it contains x). Thus it has a supremum 


z= sup([x, y] NV). 


Clearly z € [x, y], and hence z € X. Thus either z € V or z € W. Suppose first that 
z € V.Thenz 4 y (since y € W and V is disjoint from W). But V is open relative to 
X, which contains [x, y], so there is some ball Bix, y,a)(z, 7) which is contained in V. 
But this contradicts the fact that z is the supremum of [x, y] 1 V. Now suppose that 
z € W. Then z # x (since x € V and V is disjoint from W). But W is open relative 


32 2 Continuous Functions on Metric Spaces 


to X, which contains [x, y], so there is some ball Biy,yj,a)(z, 7) which is contained 

in W. But this again contradicts the fact that z is the supremum of [x, y] M V. Thus 

in either case we obtain a contradiction, which means that X cannot be disconnected 
and must therefore be connected. 

It remains to show that (b) and (c) are equivalent; we leave this to Exercise 2.4.3. 

Oo 


Continuous functions map connected sets to connected sets: 


Theorem 2.4.6 (Continuity preserves connectedness) Let f: X — Y be a contin- 
uous map from one metric space (X, dx) to another (Y, dy). Let E be any connected 
subset of X. Then f (E) is also connected. 


Proof See Exercise 2.4.4. Oo 


An important corollary of this result is the intermediate value theorem, general- 
izing Theorem 9.7.1. 


Corollary 2.4.7 (Intermediate value theorem) Let f : X — R be a continuous map 

from one metric space (X, dx) to the real line. Let E be any connected subset of X, 

and let a, b be any two elements of E. Let y be a real number between f (a) and f (b), 

i.e., either f(a) < y < f(b) or f(a) = y = f(b). Then there exists c € E such that 

fO=y. 

Proof See Exercise 2.4.5. Oo 
— Exercises — 


Exercise 2.4.1 Let (X, dyjs-) be a metric space with the discrete metric. Let E bea 
subset of X which contains at least two elements. Show that E is disconnected. 


Exercise 2.4.2 Let f: X — Y bea function from a connected metric space (X, d) 
to a metric space (Y, dgis-) with the discrete metric. Show that f is continuous if and 
only if it is constant. (Hint: use Exercise 2.4.1.) 


Exercise 2.4.3. Prove the equivalence of statements (b) and (c) in Theorem 2.4.5. 


Exercise 2.4.4 Prove Theorem 2.4.6. (Hint: the formulation of continuity in Theo- 
rem 2.1.5(c) is the most convenient to use.) 


Exercise 2.4.5 Use Theorem 2.4.6 to prove Corollary 2.4.7. 


Exercise 2.4.6 Let (X,d) be a metric space, and let (Ey)ye,; be a collection of 
connected sets in X with J non-empty. Suppose also that (),., Eq is non-empty. 


Show that L,.<,; Ea is connected. 


Exercise 2.4.7 Let (X,d) be a metric space, and let E be a subset of X. We say 
that E is path-connected iff, for every x, y € E, there exists a continuous function 
y : [0,1] — E from the unit interval [0, 1] to E such that y(0) = x and y(1) = y. 
Show that every non-empty path-connected set is connected. (The converse is false, 
but is a bit tricky to show and will not be detailed here.) 
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Exercise 2.4.8 Let (X,d) bea metric space, and let E be a subset of X. Show that 
if E is connected, then the closure E of E is also connected. Is the converse true? 


Exercise 2.4.9 Let (X, d) be a metric space. Let us define a relation x ~ y on X by 
declaring x ~ y iff there exists a connected subset of X which contains both x and 
y. Show that this is an equivalence relation (i.e., it obeys the reflexive, symmetric, 
and transitive axioms). Also, show that the equivalence classes of this relation (.e., 
the sets of the form {y € X : y ~ x} for some x € X) are all closed and connected. 
(Hint: use Exercise 2.4.6 and Exercise 2.4.8.) These sets are known as the connected 
components of X. 


Exercise 2.4.10 Combine Proposition 2.3.2 and Corollary 2.4.7 to deduce a the- 
orem for continuous functions on a compact connected domain which generalizes 
Corollary 9.7.4. 


2.5 Topological Spaces (Optional) 


The concept of a metric space can be generalized to that of a topological space. The 
idea here is not to view the metric d as the fundamental object; indeed, in a general 
topological space there is no metric at all. Instead, it is the collection of open sets 
which is the fundamental concept. Thus, whereas in a metric space one introduces the 
metric d first, and then uses the metric to define first the concept of an open ball and 
then the concept of an open set, in a topological space one starts just with the notion 
of an open set. As it turns out, starting from the open sets, one cannot necessarily 
reconstruct a usable notion of a ball or metric (thus not all topological spaces will 
be metric spaces), but remarkably one can still define many of the concepts in the 
preceding sections. 

We will not use topological spaces at all in this text, and so we shall be rather 
brief in our treatment of them here. A more complete study of these spaces can of 
course be found in any topology textbook or a more advanced analysis text. 


Definition 2.5.1 (Jopological spaces) A topological space is a pair (X, F), where 
X is aset and F C 2* is acollection of subsets of X, whose elements are referred 
to as open sets. Furthermore, the collection F must obey the following properties: 


e The empty set and the whole set X are open; in other words, # € F and X € F. 

e Any finite intersection of open sets is open. In other words, if V;, ..., V, is elements 
of F, then V, 1... V,, is also in F. 

e Any arbitrary union of open sets is open (including infinite unions). In other words, 
if (Va)wer is a family of sets in F, then J V, is also in F. 


ael 


In many cases, the collection F of open sets can be deduced from context, and we 
shall refer to the topological space (X, F) simply as X. 
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From Proposition 1.2.15 we see that every metric space (X, d) is automatically 
also a topological space (if we set F equal to the collection of sets which are open in 
(X, d)). However, there do exist topological spaces which do not arise from metric 
spaces (see Exercise 2.5.1, 2.5.6). 

We now develop the analogues of various notions in this chapter and the previous 
chapter for topological spaces. The notion of a ball must be replaced by the notion 
of a neighbourhood. 


Definition 2.5.2 (Neighborhoods) Let (X, F) be a topological space, and let x € X. 
A neighborhood of x is defined to be any open set in ¥ which contains x. 


Example 2.5.3 If (X,d) is a metric space, x € X, and r > 0, then B(x,r) is a 
neighborhood of x. 


Definition 2.5.4 (Topological convergence) Let m be an integer, (X, F) be a topo- 
logical space and let (x“)%,, be a sequence of points in X. Let x be a point in X. 
We say that (x"))°°_, converges to x if and only if, for every neighborhood V of x, 
there exists an N > m such that x™ € V foralln > N. 


This notion is consistent with that of convergence in metric spaces (Exercise 
2.5.2). One can then ask whether one has the basic property of uniqueness of limits 
(Proposition 1.1.20). The answer turns out to usually be yes—if the topological space 
has an additional property known as the Hausdorff property—but the answer can be 
no for other topologies; see Exercise 2.5.4. 


Definition 2.5.5 (Interior, exterior, boundary) Let (X, F) be a topological space, 
let E be a subset of X, and let xo be a point in X. We say that xo is an interior point 
of E if there exists a neighborhood V of xo such that V C E. We say that xo is an 
exterior point of E if there exists a neighborhood V of x9 such that VN E = Y. We 
say that x9 is a boundary point of E if it is neither an interior point nor an exterior 
point of E. 


This definition is consistent with the corresponding notion for metric spaces (Exer- 
cise 2.5.3). 


Definition 2.5.6 (Closure) Let (X, F) be a topological space, let E be a subset of 
X, and let x9 be a point in X. We say that x is an adherent point of E if every 
neighborhood V of xo has a non-empty intersection with FE. The set of all adherent 
points of E is called the closure of E and is denoted E. 


There is a partial analogue of Theorem 1.2.10, see Exercise 2.5.9. 

We define a set K in a topological space (X, F) to be closed iff its complement 
X\K is open; this is consistent with the metric space definition, thanks to Proposition 
1.2.15(e). Some partial analogues of that proposition are true (see Exercise 2.5.10). 

To define the notion of a relative topology, we cannot use Definition 1.3.3 as this 
requires a metric function. However, we can instead use Proposition 1.3.4 as our 
starting point: 
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Definition 2.5.7 (Relative topology) Let (X, F) be a topological space, and Y be a 
subset of X. Then we define Fy := {V NY: V € F} and refer this as the topology 
on Y induced by (X, F). We call (Y, Fy) a topological subspace of (X, F). This is 
indeed a topological space, see Exercise 2.5.11. 


From Proposition 1.3.4 we see that this notion is compatible with the one for 
metric spaces. 
Next we define the notion of continuity. 


Definition 2.5.8 (Continuous functions) Let (X, Fx) and (Y, Fy) be topological 
spaces, and let f: X — Y bea function. If x9 € X, we say that f is continuous at 
xo iff for every neighborhood V of f (xo), there exists a neighborhood U of x9 such 
that f(U) C V. We say that f is continuous iff it is continuous at every point x € X. 


This definition is consistent with that in Definition 2.1.1 (Exercise 2.5.14). Partial 
analogues of Theorems 2.1.4 and 2.1.5 are available (Exercise 2.5.15). In particular, 
a function is continuous iff the pre-images of every open set are open. 

There is unfortunately no notion of a Cauchy sequence, a complete space, or a 
bounded space, for general topological spaces. However, there is certainly a notion 
of a compact space, as we can see by taking Theorem 1.5.8 as our starting point: 


Definition 2.5.9 (Compact topological spaces) Let (X, F) be a topological space. 
We say that this space is compact if every open cover of X has a finite subcover. If 
Y is a subset of X, we say that Y is compact if the topological space on Y induced 
by (X, F) is compact. 


Many basic facts about compact metric spaces continue to hold true for compact 
topological spaces, notably Theorem 2.3.1 and Proposition 2.3.2 (Exercise 2.5.16). 
However, there is no notion of uniform continuity, and so there is no analogue of 
Theorem 2.3.5. 

We can also define the notion of connectedness by repeating Definition 2.4.1 
verbatim and also repeating Definition 2.4.3 (but with Definition 2.5.7 instead of 
Definition 1.3.3). Many of the results and exercises in Sect. 2.4 continue to hold for 
topological spaces (with almost no changes to any of the proofs!). 


— Exercises — 


Exercise 2.5.1 Let X be an arbitrary set, and let F := {G, X}. Show that (X, F) is 
a topology (called the trivial topology on X). If X contains more than one element, 
show that the trivial topology cannot be obtained from by placing a metric d on X. 
Show that this topological space is both compact and connected. 


Exercise 2.5.2 Let (X, d) be a metric space (and hence a topological space). Show 
that the two notions of convergence of sequences in Definition 1.1.14 and Definition 
2.5.4 coincide. 


Exercise 2.5.3 Let (X, d) be a metric space (and hence a topological space). Show 
that the two notions of interior, exterior, and boundary in Definition 1.2.5 and Defi- 
nition 2.5.5 coincide. 
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Exercise 2.5.4 A topological space (X, F) is said to be Hausdorff if given any two 
distinct points x, y € X, there exists a neighborhood V of x and a neighborhood W 
of y such that VM W = &. Show that any topological space coming from a metric 
space is Hausdorff, and show that the trivial topology is not Hausdorff. Show that 
the analogue of Proposition 1.1.20 holds for Hausdorff topological spaces, but give 
an example of a non-Hausdorff topological space in which Proposition 1.1.20 fails. 
(In practice, most topological spaces one works with are Hausdorff; non-Hausdorff 
topological spaces tend to be so pathological that it is not very profitable to work 
with them.) 


Exercise 2.5.5 Given any totally ordered set X with order relation <, declare a 
set V C X to be open if for every x € V there exists a set J which is an interval 
{fy € X:a<y <b}forsomea,b € X,aray {y € X: a < y} forsomea € X, the 
ray {y € X : y < b} for some b € X, or the whole space X, which contains x and 
is contained in V. Let F be the set of all open subsets of X. Show that (X, F) 
is a topology (this is the order topology on the totally ordered set (X, <)) which 
is Hausdorff in the sense of Exercise 2.5.4. Show that on the real line R (with 
the standard ordering <), the order topology matches the standard topology (i.e., 
the topology arising from the standard metric). If instead one applies this to the 
extended real line R*, show that R is an open set with boundary {—oo, +oo}. If 
(Xn)P2, iS a Sequence of numbers in R (and hence in R*), show that x, converges 
to +00 if and only if lim inf, ... x, = +00, and x, converges to —oo if and only if 
lim sup, , 45 Xn = —00. 


Exercise 2.5.6 Let X be an uncountable set, and let F be the collection of all subsets 
E in X which are either empty or cofinite (which means that X\ EF is finite). Show 
that (X, F) is a topology (this is called the cofinite topology on X) which is not 
Hausdorff in the sense of Exercise 2.5.4 and is compact and connected. Also, show 
that if x € X (V,,)°°, is any countable collection of open sets containing x, then 
(V1 Vn # {x}. Use this to show that the cofinite topology cannot be obtained by 
placing a metric d on X. (Hint: what is the set (\>-, B(x, 1/n) equal to in a metric 


space?) 


Exercise 2.5.7 Let X be an uncountable set, and let F be the collection of all subsets 
E in X which are either empty or cocountable (which means that X\E is at most 
countable). Show that (X, F) is a topology (this is called the cocountable topology 
on X) which is not Hausdorff in the sense of Exercise 2.5.4, and connected, but 
cannot arise from a metric space and is not compact. 


Exercise 2.5.8 Let (X, F) be acompact topological space. Assume that this space is 
first countable, which means that for every x € X there exists a countable collection 
Vi, V2, ... of neighborhoods of x, such that every neighborhood of x contains one of 
the V,,. Show that every sequence in X has a convergent subsequence, by modifying 
Exercise 1.5.11. 


Exercise 2.5.9 Prove the following partial analogue of Proposition 1.2.10 for topo- 
logical spaces: (c) implies both (a) and (b), which are equivalent to each other. Show 
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that in the cocountable topology in Exercise 2.5.7, it is possible for (a) and (b) to 
hold without (c) holding. 


Exercise 2.5.10 Let E be a subset of a topological space (X, F). Show that E is 
open if and only if every element of E is an interior point, and show that E is closed 
if and only if E contains all of its adherent points. Prove analogues of Proposition 
1.2.15(e)-(h) (some of these are automatic by definition). If we assume in addition 
that X is Hausdorff, prove an analogue of Proposition 1.2.15(d) also, but give an 
example to show that (d) can fail when X is not Hausdorff. 


Exercise 2.5.11 Show that the pair (Y, Fy) defined in Definition 2.5.7 is indeed a 
topological space. 


Exercise 2.5.12 Generalize Corollary 1.5.9 to compact sets in a Hausdorff topolog- 
ical space. 


Exercise 2.5.13 Generalize Theorem 1.5.10 to compact sets in a Hausdorff topo- 
logical space. 


Exercise 2.5.14 Let (X, dx) and (Y, dy) be metric spaces (and hence a topological 
space). Show that the two notions continuity (both at a point, and on the whole 
domain) of a function f: X — Y in Definition 2.1.1 and Definition 2.5.8 coincide. 


Exercise 2.5.15 Show that when Theorem 2.1.4 is extended to topological spaces, 
that (a) implies (b). (The converse is false, but constructing an example is difficult.) 
Show that when Theorem 2.1.5 is extended to topological spaces, that (a), (c), (d) 
are all equivalent to each other and imply (b). (Again, the converse implications are 
false, but difficult to prove.) 


Exercise 2.5.16 Generalize both Theorem 2.3.1 and Proposition 2.3.2 to compact 
sets in a topological space. 


Chapter 3 ®) 
Uniform Convergence ra 


In the previous two chapters we have seen what it means for a sequence (x")™, 
of points in a metric space (X, dy) to converge to a limit x; it means that lim,_,. 
dx (x, x) = 0, or equivalently that for every ¢ > 0 there exists an N > O such that 
dx(x™, x) < ¢ foralln > N. (We have also generalized the notion of convergence 
to topological spaces (X, F), but in this chapter we will focus on metric spaces.) 

In this chapter, we consider what it means for a sequence of functions (f\)*, 
from one metric space (X, dx) to another (Y, dy) to converge. In other words, we 
have a sequence of functions f"!, f®, ..., with each function f™ : X — Y being 
a function from X to Y, and we ask what it means for this sequence of functions to 
converge to some limiting function f. 

It turns out that there are several different concepts of convergence of functions; 
here we describe the two most important ones, pointwise convergence and uniform 
convergence. (There are other types of convergence for functions, such as L! conver- 
gence, L* convergence, convergence in measure, almost everywhere convergence, 
and so forth, but these are beyond the scope of this text.) The two notions are related, 
but not identical; the relationship between the two is somewhat analogous to the 
relationship between continuity and uniform continuity. 

Once we work out what convergence means for functions, and thus can make sense 
of such statements as limy+oo f ™ — Jf, we will then ask how these limits interact 
with other concepts. For instance, we already have a notion of limiting values of 
functions: limy-,.):,ex f(x). Can we interchange limits, 1.e., 


lim lim f™%~)= lim lim f™(x)? 


N+ OO x>x9;xEX X—>x9;xEX N> CO 


As we shall see, the answer depends on what type of convergence we have for f). 
We will also address similar questions involving interchanging limits and integrals, 
or limits and sums, or sums and integrals. 
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3.1 Limiting Values of Functions 


Before we talk about limits of sequences of functions, we should first discuss a 
similar, but distinct, notion, that of limiting values of functions. We shall focus on 
the situation for metric spaces, but there are similar notions for topological spaces 
(Exercise 3.1.3). 


Definition 3.1.1 (Limiting value of a function) Let (X, dx) and (Y, dy) be metric 
spaces, let E be a subset of X, and let f: E — Y be a function. If x9 € X is an 
adherent point of E, and L € Y, we say that f(x) converges to L in Y as x converges 
to xo in E, or write limy_..):xex f(x) = L, if for every e > 0 there exists a5 > 0 
such that dy(f (x), L) < ¢ for all x € E such that dx (x, x9) < 6. 


Remark 3.1.2 Some authors exclude the case x = xo from the above definition, 
thus requiring 0 < dxy(x, xo) < 6. In our current notation, this would correspond 
to removing xo from E, thus one would consider lim,-, y):,eF\{xo) f(%) instead of 
lim; x9:xee f (x). See Exercise 3.1.1 for a comparison of the two concepts. 


Comparing this with Definition 2.1.1, we see that f is continuous at xo if and only 
if 
lim f(x) = f(x). 
X—>xX0i:XEX 


Thus f is continuous on X if we have 


lim f(&) = f (ao) for all xo € X. 


X>XQ;XE 


Example 3.1.3 If f : R > R is the function f(x) = x? — 4, then 


lim f(@@) = f() =1-4=-3 


since f is continuous. 


Remark 3.1.4 Often we shall omit the condition x € X, and abbreviate lim,-, .)-.ex 
f (x) as simply lim,-,,, f(x) when it is clear what space x will range in. 


One can rephrase Definition 3.1.1 in terms of sequences: 


Proposition 3.1.5 Let (X, dx) and (Y, dy) be metric spaces, let E be a subset of X, 
and let f: E — Y bea function. Let xy € X be an adherent point of E and L € Y. 
Then the following four statements are logically equivalent: 


(a) limy- x9:xeE f@) =L. 

(b) For every sequence Cs i in E which converges to x9 with respect to the 
metric dx, the sequence (f (x\))°, converges to L with respect to the metric 
dy. 
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(c) For every open set V CY which contains L, there exists an open set U C X 
containing xo such that f(UN E) C V. 

(d) Ifone defines the function g: E U {xo} > Y by defining g(xo):= L, and g(x) := 
JS (x) for x € E\{xo}, then g is continuous at xo. Furthermore, if xo € E, then 
f (xo) = L. 

Proof See Exercise 3.1.2. Oo 


Remark 3.1.6 Observe from Propositions 3.1.5(b) and 1.1.20 that a function f(x) 
can converge to at most one limit L as x converges to xo. In other words, if the limit 


lim f(x) 


X>xX9;XEE 
exists at all, then it can only take at most one value. 


Remark 3.1.7 The requirement that x9 be an adherent point of E is necessary for 
the concept of limiting value to be useful, otherwise xo will lie in the exterior of 
E, the notion that f(x) converges to L as x converges to xo in E is vacuous (for 6 
sufficiently small, there are no points x € E so that d(x, xo) < 4). 


Remark 3.1.8 Strictly speaking, we should write 


dy— lim f(x) insteadof lim tO: 


X>XxX0;xEE X> XO; XE 


since the convergence depends on the metric dy. However in practice it will be 
obvious what the metric dy is and so we will omit the dy— prefix from the notation. 


— Exercises — 


Exercise 3.1.1 Let (X, dy) and (Y, dy) be metric spaces, let E be a subset of X, 
let f: E — Y bea function, and let x9 be an element of E. Assume that x9 is an 
adherent point of E'\{xo} (or equivalently, that xo is not an isolated point of E). Show 
that the limit lim,_,,,.,ez f(«) exists if and only if the limit limy-, x). ,ez\{x9} f(%) 
exists and is equal to f(x). Also, show that if the limit lim,-..,-.cz f(x) exists at 
all, then it must equal f (xo). 


Exercise 3.1.2 Prove Proposition 3.1.5. (Hint: review your proof of Theorem 2.1.4.) 


Exercise 3.1.3 Use Proposition 3.1.5(c) to define a notion of a limiting value of a 
function f: E — Y from one topological space (X, Fx) to another (Y, Fy), with E 
a subset of X. If X is a topological space and Y is a Hausdorff topological space (see 
Exercise 2.5.4), prove the equivalence of Proposition 3.1.5(c) and (d), as well as an 
analogue of Remark 3.1.6. What happens to these statements if Y is not assumed to 
be Hausdorff? 
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Exercise 3.1.4 Recall from Exercise 2.5.5 that the extended real line R* comes 
with a standard topology (the order topology). We view the natural numbers N as 
a subspace of this topological space, and +oo as an adherent point of N in R%*. 
Let (a,)~°2.9 be a sequence taking values in a topological space (Y, Fy), and let 
L € Y. Show that limy_,+00:nen Gn = L (in the sense of Exercise 3.1.3) if and only 
if lim,_. 49 d, = L (in the sense of Definition 2.5.4). This shows that the notions of 
limiting values of a sequence, and limiting values of a function, are compatible. 


Exercise 3.1.5 Let (X, dx), (Y, dy), (Z, dz) be metric spaces, let E be a subset of 
X, and let x» € X, we Y, 2 € Z. Let f: E— Y and g: Y —> Z be functions, 
and let E be a set. If we have lim,_,,):.ex f(x) = yo and limy_, y)-ye ce) 8(Y) = Zo, 
conclude that limy-.x):rez g 9 f(x) = Zo. 


Exercise 3.1.6 State and prove an analogue of the limit laws in Proposition 9.3.14 
when X is now a metric space rather than a subset of R. (Hint: use Corollary 2.2.3.) 


3.2 Pointwise and Uniform Convergence 


The most obvious notion of convergence of functions is pointwise convergence, or 
convergence at each point of the domain: 


Definition 3.2.1 (Pointwise convergence) Let (f)%, be a sequence of functions 
from one metric space (X, dx) to another (Y, dy), and let f: X — Y be another 
function. We say that (f)°, converges pointwise to f on X if we have 


Jim fO@) = f@) 


for all x € X,1e., 
lim dy(f (x), f(x)) = 0. 
noo 


Or in other words, for every x and every ¢ > 0 there exists N > 0 such that 
dy(f™ (x), f(x)) < € for every n > N. We call the function f the pointwise limit 
of the functions f. 


Remark 3.2.2 Note that f (x) and f (x) are points in Y, rather than functions, so we 
are using our prior notion of convergence in metric spaces to determine convergence 
of functions. Also note that we are not really using the fact that (X, dx) is a metric 
space (i.e., we are not using the metric dx); for this definition it would suffice for X 
to just be a plain old set with no metric structure. However, later on we shall want 
to restrict our attention to continuous functions from X to Y, and in order to do so 
we need a metric on X (and on Y), or at least a topological structure. Also when we 
introduce the concept of uniform convergence, then we will definitely need a metric 
structure on X and Y; there is no comparable notion for topological spaces. 
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Example 3.2.3 Consider the functions f: R — R defined by f(x) :=x/n, 
while f: R > R is the zero function f(x):=0. Then f converges pointwise 
to f, since for each fixed real number x we have lim, _,.,5 f(x) = limy.o x/n = 
O= f(x). 

From Proposition 1.1.20 we see that a sequence (f a of functions from one 
metric space (X, dx) to another (Y, dy) can have at most one pointwise limit f (this 
explains why we can refer to f as the pointwise limit). However, it is of course 
possible for a sequence of functions to have no pointwise limit (can you think of an 
example?), just as a sequence of points in a metric space do not necessarily have a 
limit. 

Pointwise convergence is a very natural concept, but it has a number of disadvan- 
tages: it does not preserve continuity, derivatives, limits, or integrals, as the following 
three examples show. 


Example 3.2.4 Consider the functions f: [0, 1] > R defined by f(x) := x", 
and let f: [0, 1] — R be the function defined by setting f(x):=1 when x = | and 
f (x) :=0 when 0 < x < 1. Then the functions f) are continuous, and converge 
pointwise to f on [0, 1] (why? Treat the cases x = | and 0 < x < | separately), 
however the limiting function f is not continuous. Note that the same example 
shows that pointwise convergence does not preserve differentiability either. 


Example 3.2.5 Vf limy—x:xee f(x) = L for every n, and f\ converges point- 
wise to f, we cannot always take limits conclude that lim,_,.):xee f(x) = L. The 
previous example is also a counterexample here: observe that lim,—,1-,¢,0,1) x” = 1 
for every n, but x” converges pointwise to the function f defined in the previous 
paragraph, and lim,-,|-,<fo,1) f(x) = 0. In particular, we see that 


lim lim f@~)4¢ lim lim f(x). 


N+ OO x>x9;xEX X—>x9;xEX NCO 


(cf. Example 1.2.8). Thus pointwise convergence does not preserve limits. 
Example 3.2.6 Suppose that f): [a,b] + R a sequence of Riemann-integrable 
functions on the interval [a, b]. If fi,,, f = L for every n, and f\) converges 
pointwise to some new function f/f, this does not mean that Te bl f = L. Anexample 
comes by setting [a, b]:=[0, 1], and letting f™ be the function f (x) :=2n when 
x €[1/2n, 1/n], and f(x):=0 for all other values of x. Then f converges 
pointwise to the zero function f (x) :=0 (why?). On the other hand, fi, ,, f° = 1 
for every n, while So. 1 f = 0. In particular, we have an example where 


One may think that this counterexample has something to do with the f being 
discontinuous, but one can easily modify this counterexample to make the f con- 
tinuous (can you see how?). 
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Another example in the same spirit is the “moving bump” example. Let f : R > 
R be the function defined by f(x) := 1 if x € [n,n +1] and f(x) :=0 other- 
wise. Then /, f) = 1 for every n (where {, f is defined as the limit of eave fas 
N goes to infinity). On the other hand, f converges pointwise to the zero function 
0 (why?), and ie 0 = 0. In both of these examples, functions of area | have somehow 
“disappeared” to produce functions of area 0 in the limit. See also Example 1.2.9. 


These examples show that pointwise convergence is too weak a concept to be 
of much use. The problem is that while f(x) converges to f(x) for each x, the 
rate of that convergence varies substantially with x. For instance, consider the first 
example where f : [0, 1] > R was the function f(x) :=x",and f: [0,1] > R 
was the function such that f(x) :=1 when x = 1, and f(x) :=0 otherwise. Then 
for each x, f(x) converges to f(x) as n — ox; this is the same as saying that 
limy-so x” = 0 when O < x < 1, and that lim, ... x” = 1 when x = 1. But the 
convergence is much slower near | than far away from 1. For instance, consider the 
statement that lim,_,.. x” = 0 for all 0 < x < 1. This means, for every 0 < x < 1, 
that for every ¢, there exists an N > 1 such that |x”| < e for all m > N—or in other 
words, the sequence 1, x, x”, x*,... will eventually get less than ¢, after passing 
some finite number N of elements in this sequence. But the number of elements 
N one needs to go out to depends very much on the location of x. For instance, 
take ¢:=0.1. If x = 0.1, then we have |x”| < ¢ for all n > 2—the sequence gets 
underneath ¢ after the second element. But if x = 0.5, then we only get |x”| < e for 
n > 4—you have to wait until the fourth element to get within ¢ of the limit. And if 
x = 0.9, then one only has |x”| < ¢ whenn > 22. Clearly, the closer x gets to 1, the 
longer one has to wait until f (x) will get within ¢ of f(x), although it still will get 
there eventually. (Curiously, however, while the convergence gets worse and worse 
as x approaches 1, the convergence suddenly becomes perfect when x = 1.) 

To put things another way, the convergence of f” to f is not uniform in x—the 
N that one needs to get f(x) within ¢ of f depends on x as well as on e. This 
motivates a stronger notion of convergence. 


Definition 3.2.7 (Uniform convergence) Let (f\”)®, be a sequence of functions 
from one metric space (X, dx) to another (Y, dy), and let f: X — Y be another 
function. We say that (f)°°, converges uniformly to f on X if for every ¢ > 0 
there exists N > O such that dy(f (x), f(x)) < ¢ foreveryn > N andx € X. We 
call the function f the uniform limit of the functions f™. 


Remark 3.2.8 Note that this definition is subtly different from the definition for 
pointwise convergence in Definition 3.2.1. In the definition of pointwise conver- 
gence, N was allowed to depend on x; now it is not. The reader should compare 
this distinction to the distinction between continuity and uniform continuity (i.e., 
between Definitions 2.1.1 and 2.3.4). A more precise formulation of this analogy is 
given in Exercise 3.2.1. 


It is easy to see that if f converges uniformly to f on X, then it also converges 
pointwise to the same function f (see Exercise 3.2.2); thus when the uniform limit 
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and pointwise limit both exist, then they have to be equal. However, the converse is 
not true; for instance the functions f: [0, 1] + R defined earlier by f (x) := x” 
converge pointwise, but do not converge uniformly (see Exercise 3.2.2). 


Example 3.2.9 Let f™: [0,1] > R be the functions f™(x):=x/n, and let 
f: (0, 1] > R be the zero function f(x) :=0. Then it is clear that f“ converges 
to f pointwise. Now we show that in fact f™ converges to f uniformly. We have 
to show that for every ¢ > 0, there exists an N such that | f(x) — f(x)| < € for 
every x € [0, 1] and every n > N. To show this, let us fix an e > 0. Then for any 
x € [0, 1] andn > N, we have 


[Ff (x) — fQ®)| = |x/n — 0| = x/n <1/n <1/N. 


Thus if we choose N such that N > 1/e (note that this choice of N does not depend 
on what x is), then we have | f(x) — f(x)| < ¢ for all n > N and x € [0, 1], as 
desired. 


We make one trivial remark here: if a sequence f: X — Y of functions con- 
verges pointwise (or uniformly) to a function f: X — Y, then the restrictions 
fle: E> Y of f™ to some subset E of X will also converge pointwise (or 
uniformly) to f|z. (Why?) 


— Exercises — 


Exercise 3.2.1 The purpose of this exercise is to demonstrate a concrete relationship 
between continuity and pointwise convergence, and between uniform continuity and 
uniform convergence. Let f: R — R bea function. For anya € R, let f,: R- R 
be the shifted function fo(x) := f(x — a). 


(a) Show that f is continuous if and only if, whenever (a,,)°° 9 is a sequence of real 
numbers which converges to zero, the shifted functions f,, converge pointwise 
to f. 

(b) Show that f is uniformly continuous if and only if, whenever (a,)°°5 is a 
sequence of real numbers which converges to zero, the shifted functions fy, 
converge uniformly to /. 


Exercise 3.2.2 (a) Let (f @yjee be a sequence of functions from one metric space 
(X, dx) to another (Y, dy), and let f: X — Y be another function from X to Y. 
Show that if f converges uniformly to f, then f also converges pointwise 
to f. 

For each integer n > 1, let f™: (—1,1) > R be the function f(x) := x". 
Prove that f converges pointwise to the zero function 0, but does not converge 
uniformly to any function f: (—1,1) > R. 

Let g: (—1, 1) > R be the function g(x) :=x/(1 — x). With the notation as in 
(b), show that the partial sums > f converge pointwise as N > oo to g, 
but does not converge uniformly to g, on the open interval (—1, 1). (Hint: use 
Lemma 7.3.3.) What would happen if we replaced the open interval (—1, 1) with 
the closed interval [—1, 1]? 


(b 


wm 


(c 


wa 
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Exercise 3.2.3 Let (X, dy) a metric space, and for every integern > 1, let f,: X > 
R be a real-valued function. Suppose that f,, converges pointwise to another func- 
tion f: X — R on X (in this question we give R the standard metric d(x, y) = 
|x — y|). Let h: R > R be a continuous function. Show that the functions h o f, 
converge pointwise to ho f on X, where ho f,: X — R is the function ho 
Sn(x) :=hA(fn(x)), and similarly for ho f. 


Exercise 3.2.4 Let f,,: X — Y be asequence of bounded functions from one metric 
space (X, dx) to another metric space (Y, dy). Suppose that f,, converges uniformly 
to another function f: X — Y. Suppose that f is a bounded function; i.e., there 
exists a ball Byy.g,)(yo, R) in Y such that f(x) € Bya,)(o, RX) for all x ¢ X. Show 
that the sequence f, is uniformly bounded; i.e., there exists a ball Byy.ay)(yo, R) in 
Y such that f,(x) € Bry,a,)(yo, R) for all x € X and all positive integers n. 


3.3. Uniform Convergence and Continuity 


We now give the first demonstration that uniform convergence is significantly better 
than pointwise convergence. Specifically, we show that the uniform limit of contin- 
uous functions is continuous. 


Theorem 3.3.1 (Uniform limits preserve continuity I) Suppose (f™)%, is a 
sequence of functions from one metric space (X, dx) to another (Y, dy), and suppose 
that this sequence converges uniformly to another function f: X — Y. Let xo bea 
point in X. If the functions f are continuous at xo for each n, then the limiting 
function f is also continuous at xo. 


Proof See Exercise 3.3.1. oO 
This has an immediate corollary: 


Corollary 3.3.2. (Uniform limits preserve continuity II) Ler (f)°, be a sequence 
of functions from one metric space (X, dx) to another (Y, dy), and suppose that this 
sequence converges uniformly to another function f : X — Y. If the functions f 
are continuous on X for each n, then the limiting function f is also continuous on 
X. 


This should be contrasted with Example 3.2.4. There is a slight variant of Theorem 
3.3.1 which is also useful: 


Proposition 3.3.3 (Interchange of limits and uniform limits) Let (X, dx) and (Y, dy) 
be metric spaces, with Y complete, and let E be a subset of X. Let (f\™), 
be a sequence of functions from E to Y, and suppose that this sequence con- 
verges uniformly in E to some function f: E — Y. Let x9 € X be an adherent 
point of E, and suppose that for each n the limit lim, x):xeE f (x) exists. Then 
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the limit lim,-.x):xee f(x) also exists, and is equal to the limit of the sequence 


(lim, xoxek Fag 62) meee in other words we have the interchange of limits 


lim lim f™@)= lim _ lim f™(). 
NOOO x>x0;XEE X>x9;xXxEE n> Co 
Proof See Exercise 3.3.2. oO 


This should be contrasted with Example 3.2.5. Finally, we have a version of these 
theorems for sequences: 


Proposition 3.3.4 Let (f)°, be a sequence of continuous functions from one 
metric space (X, dx) to another (Y, dy), and suppose that this sequence converges 
uniformly to another function f: X — Y. Let x™ be a sequence of points in X 
which converge to some limit x. Then f(x) converges (in Y) to f (x). 


Proof See Exercise 3.3.4. oO 
A similar result holds for bounded functions: 


Definition 3.3.5 (Bounded functions) A function f : X — Y from one metric space 
(X, dx) to another (Y, dy) is bounded if f(X) is a bounded set, i.e., there exists a 
ball Byy.ay) (yo, R) in Y such that f(x) € Byy,a,)(yo, R) for all x € X. 


Proposition 3.3.6 (Uniform limits preserve boundedness) Let (f)%, be a 
sequence of functions from one metric space (X, dx) to another (Y, dy), and suppose 
that this sequence converges uniformly to another function f : X — Y. If the func- 
tions f” are bounded on X for each n, then the limiting function f is also bounded 
on X. 


Proof See Exercise 3.3.6. Oo 


Remark 3.3.7 The above propositions sound very reasonable, but one should caution 
that it only works if one assumes uniform convergence; pointwise convergence is not 
enough. (See Exercises 3.3.3, 3.3.5 and 3.3.7.) 


— Exercises — 


Exercise 3.3.1 Prove Theorem 3.3.1. Explain briefly why your proof requires uni- 
form convergence, and why pointwise convergence would not suffice. (Hints: it is 
easiest to use the “epsilon-delta” definition of continuity from Definition 2.1.1. You 
may find the triangle inequality 


dy(F (x), f(X0)) < dy (F@), FOC) + dy(FO™), FO Go)) 
+ dy(f x0), f Go) 
useful. Also, you may need to divide ¢ as ¢ = ¢/3 + €/3 + €/3. Finally, it is pos- 


sible to prove Theorem 3.3.1 from Proposition 3.3.3, but you may find it easier 
conceptually to prove Theorem 3.3.1 first.) 


48 3 Uniform Convergence 


Exercise 3.3.2 Prove Proposition 3.3.3. (Hint: this is very similar to Theorem 3.3.1. 
Theorem 3.3.1 cannot be used to prove Proposition 3.3.3, however it is possible to 
use Proposition 3.3.3 to prove Theorem 3.3.1.) 


Exercise 3.3.3 Compare Proposition 3.3.3 with Example 1.2.8. Can you now 
explain why the interchange of limits in Example 1.2.8 led to a false statement, 
whereas the interchange of limits in Proposition 3.3.3 is justified? 


Exercise 3.3.4 Prove Proposition 3.3.4. (Hint: again, this is similar to Theorem 3.3.1 
and Proposition 3.3.3, although the statements are slightly different, and one cannot 
deduce this directly from the other two results.) 


Exercise 3.3.5 Give an example to show that Proposition 3.3.4 fails if the phrase 
“converges uniformly” is replaced by “converges pointwise”. (Hint: some of the 
examples already given earlier will already work here.) 


Exercise 3.3.6 Prove Proposition 3.3.6. Discuss how this proposition differs from 
Exercise 3.2.4. 


Exercise 3.3.7 Give an example to show that Proposition 3.3.6 fails if the phrase 
“converges uniformly” is replaced by “converges pointwise”. (Hint: some of the 
examples already given earlier will already work here.) 


Exercise 3.3.8 Let (X,d) be a metric space, and for every positive integer n, let 
Jn: X — Rand g,: X — Rbe functions. Suppose that (f,,)°° , converges uniformly 
to another function f: X — R, and that (g,,)?°, converges uniformly to another 
function g: X — R. Suppose also that the functions (f,)°° , and (g,)P°., are uni- 
formly bounded, i.e., there exists an M > 0 such that | f,(x)| < M and |g,(x)| < M 
for alln > 1 and x € X. Prove that the functions f,g,: X — R converge uniformly 


to fg: X > R. 


3.4 The Metric of Uniform Convergence 


We have now developed at least four, apparently separate, notions of limit in this 
text: 


(a) limits lim,_.9.x of sequences of points in a metric space (Definition 1.1.14; 
see also Definition 2.5.4); 

(b) limiting values lim,_,,)-.ez f(x) of functions at a point (Definition 3.1.1); 

(c) pointwise limits f of functions f (Definition 3.2.1); and 

(d) uniform limits f of functions f (Definition 3.2.7). 


This proliferation of limits may seem rather complicated. However, we can reduce 
the complexity slightly by observing that (d) can be viewed as a special case of (a), 
though in doing so it should be cautioned that because we are now dealing with 
functions instead of points, the convergence is not in X or in Y, but rather in a new 
space, the space of functions from X to Y. 
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Remark 3.4.1 If one is willing to work in topological spaces instead of metric spaces, 
we can also view (a) as a special case of (b), see Exercise 3.1.4, and (c) is also a 
special case of (a), see Exercise 3.4.4. Thus the notion of convergence in a topological 
space can be used to unify all the notions of limits we have encountered so far. 


Definition 3.4.2 (Metric space of bounded functions) Suppose (X, dx) and (Y, dy) 
are metric spaces. We let B(X —> Y) denote the space! of bounded functions from 
X toY: 

B(X > Y):={f|f: X — Y is a bounded function}. 


If X is non-empty, we define a metric d, : B(X > Y) x B(X — Y) > [0, +00) 
by defining 


doo f, 8) = ee dy (f(x), 8(x)) = sup{dy (f(x), g@)) : x € X} 


for all f, g © B(X — Y). This metric is sometimes known as the uniform metric, 
sup norm metric or the L© metric. We will also use dgcxy) as a synonym for do. 
If X is empty, we instead define d.(f, g) = 0. 


Notice that the distance d..(f, g) is always finite because f and g are assumed 
to be bounded on X. 


Example 3.4.3 Let X :=[0, 1] and Y = R. Let f: [0,1] ~ Rand g: [0,1] ~ R 
be the functions f (x) := 2x and g(x) := 3x. Then f and g are both bounded functions 
and thus live in B([0, 1] — R). The distance between them is 


doo(f,g) = sup [2x —3x| = sup |x| =1. 
xe[0,1] xe[0,1] 


This space turns out to be a metric space (Exercise 3.4.1). Convergence in this 
metric turns out to be identical to uniform convergence: 


Proposition 3.4.4 Let (X,dx) and (Y, dy) be metric spaces. Let CFS 4 be a 
sequence of functions in B(X — Y), and let f be another function in B(X — Y). 


Then (f”)°_, converges to f inthe metric dg.x-s yy ifand only if (f ™)°, converges 


uniformly to f. 
Proof See Exercise 3.4.2. oO 


Now let C(X — Y) be the space of bounded continuous functions from X to Y: 


C(X — Y):={f € B(X — Y)|f is continuous}. 


' Note that this is a set, thanks to the power set axiom (Axiom 3.11) and the axiom of specification 
(Axiom 3.6). 
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This set C(X — Y) is clearly a subset of B(X — Y). Corollary 3.3.2 asserts that 
this space C(X — Y) is closed in B(X — Y) (why?). Actually, we can say a lot 
more: 


Theorem 3.4.5 (The space of continuous functions is complete) Let (X, dy) be 
a metric space, and let (Y,dy) be a complete metric space. The space (C(X > 
Y), dg(x—yyle(xy¥)xc(x>¥)) isa complete subspace of (B(X = Y), dp(x—y))- In 
other words, every Cauchy sequence of functions in C(X — Y) converges to a func- 
tion in C(X — Y). 


Proof See Exercise 3.4.3. o 
— Exercises — 


Exercise 3.4.1 Let (X,dx) and (Y,dy) be metric spaces. Show that the space 
B(X — Y) defined in Definition 3.4.2, with the metric dg;x—.y), is indeed a metric 
space. 


Exercise 3.4.2 Prove Proposition 3.4.4. 


Exercise 3.4.3, Prove Theorem 3.4.5. (Hint: this is similar, but not identical, to the 
proof of Theorem 3.3.1). 


Exercise 3.4.4 Let (X, dy) and (Y, dy) be metric spaces, and let Y* := ff: xXx—-> 


Y} be the space of all functions from X to Y (cf. Axiom 3.11). If x9 € X and V is 
an open set in Y, let V“ C ¥* be the set 


VO) :={f © Y* : f(xo) € V}. 


If E is a subset of Y*, we say that E is open if for every f € E, there exists a finite 
number of points x1, ..., xX, € X and open sets Vj,..., V, C Y such that 


(x1) Xn) 
FEV Noah VV cz. 


(a) Show that if F is the collection of open sets in Y*, then (Y*, F) is a topological 
space. 

(b) For each natural number n, let f “+ X —> Y bea function from X to Y, and let 
f: X = Y be another function from X to Y. Show that f converges to f in 
the topology F (in the sense of Definition 2.5.4) if and only if f converges to 
f pointwise (in the sense of Definition 3.2.1). 


The topology F is known as the topology of pointwise convergence, for obvious 
reasons; it is also known as the product topology. It shows that the concept of point- 
wise convergence can be viewed as a special case of the more general concept of 
convergence in a topological space. 


3.5 Series of Functions; the Weierstrass M-Test 51 


3.5 Series of Functions; the Weierstrass M-Test 


Having discussed sequences of functions, we now discuss infinite series °°, fy of 
functions. Now we shall restrict our attention to functions f: X — R from a metric 
space (X, dx) to the real line R (which we of course give the standard metric); this 
is because we know how to add two real numbers, but don’t necessarily know how 
to add two points in a general metric space Y. Functions whose codomain is R are 
sometimes called real-valued functions. 

Finite summation is, of course, easy: given any finite collection f,..., f of 
functions from X to R, we can define the finite sum ye fO:X > Rby 


N N 
(> r) () = DP FPR). 


i=l i=l 


Example 3.5.1 If f: R > R is the function f(x) :=x, f©: R > R is the 
function f®(x):=x?, and f®: R—>R is the function f®(x):=x?, then 
f= 3, f© is the function f: R > R defined by f(x) :=x +22 +23, 


It is easy to show that finite sums of bounded functions are bounded, and finite 
sums of continuous functions are continuous (Exercise 3.5.1). 
Now to add infinite series. 


Definition 3.5.2. (Infinite series) Let (X, dx) be a metric space. Let (f), be 

a sequence of functions from X to R, and let f be another function from X 

to R. If the partial sums eee f™ converge pointwise to f on X as N > ov, 

we say that the infinite series )°° , f converges pointwise to f, and write 
=>, f™. If the partial sums ~’_, f converge uniformly to f on X 

n=1 Pp n=1 g y 

as N —> oo, we say that the infinite series )°°°, f converges uniformly to 

, and again write f = )-~, f™. (Thus when one sees an expression such as 
& n=1 Pp 
f =o, f™, one should look at the context to see in what sense this infinite series 
converges.) 


Remark 3.5.3 A series )°~, f converges pointwise to f on X if and only if 
yr, f(x) converges to f(x) for every x € X. (Thus if )°°° , f™ does not con- 
verge pointwise to f, this does not mean that it diverges pointwise; it may just be 
that it converges for some points x but diverges at other points x.) 


Ifa series )°~, f converges uniformly to f, then it also converges pointwise 
to f; but not vice versa, as the following example shows. 


Example 3.5.4 Let f=: (—1, 1) > R be the sequence of functions f(x) := x". 
Then )-~., f converges pointwise, but not uniformly, to the function x/(1 — x) 
(see Exercise 3.2.2 and Example 3.5.8). 


It is not always clear when a series °°, f converges or not. However, there 
is a very useful test that gives at least one test for uniform convergence. 
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Definition 3.5.5 (Sup norm) If f: X — R is a bounded real-valued function, and 
X is non-empty, we define the sup norm || f |loo of f to be the number 


II Flloo == sup{| f(x) : x € X}. 


In other words, || f |loo = doo(f, 0), where 0 : X — R is the zero function 0(x) :=0, 
and d.. was defined in Definition 3.4.2. (Why is this the case?) If X is empty, we 
instead define || f ||oo := 0. 


Example 3.5.6 Thus, for instance, if f: (—2, 1) — R is the function f(x) :=2x, 
then || flo. = sup{|2x| : x € (—2, 1)} = 4 (why?). Notice that when f is bounded 
then || f||.. will always be a non-negative real number. 


Theorem 3.5.7 (Weierstrass M-test) Let (X, d) be a metric space, and let (f\)~, 
be a sequence of bounded real-valued continuous functions on X such that the series 
pee \| f lloo is convergent. (Note that this is a series of plain old real numbers, 
not of functions.) Then the series ~~, f\) converges uniformly to some function 
f on X, and that function f is also continuous. 


Proof See Exercise 3.5.2. oO 


To put the Weierstrass M-test succinctly: absolute convergence of sup norms 
implies uniform convergence of functions. 


Example 3.5.8 Let 0 <r < 1 be areal number, and let f”: [—r, r] > R be the 
series of functions f(x):=x". Then each f™ is continuous and bounded, and 
| fF loo =r" (why?). Since the series )°°°, r” is absolutely convergent (e.g., by 
the root test, Theorem 7.5.1 from Analysis I), we thus see that )°°° , f converges 
uniformly in [—r, r] to some continuous function; in Exercise 3.2.2(c) we see that 
this function must in fact be the function f : [—r, r] > R defined by f(x) :=x/(1 — 
x). In other words, the series }°~ , x” is pointwise convergent, but not uniformly 
convergent, on (—1, 1), but is uniformly convergent on the smaller interval [—r, r] 
for anyO <r <1. 


The Weierstrass M-test is especially useful in relation to power series, which we 
will encounter in the next chapter. 


— Exercises — 


Exercise 3.5.1 Let f‘?,..., f bea finite sequence of bounded functions from a 
metric space (X, dy) toR. Show that >, f is also bounded. Prove a similar claim 
when “bounded” is replaced by “continuous”. What if “continuous” was replaced by 
“uniformly continuous”? 


Exercise 3.5.2. Prove Theorem 3.5.7. (Hint: first show that the sequence )-”_, f 
is a Cauchy sequence in C(X — R). Then use Theorem 3.4.5.) 
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3.6 Uniform Convergence and Integration 


We now connect uniform convergence with Riemann integration (which was dis- 
cussed in Chap. 11), by showing that uniform limits can be safely interchanged with 
integrals. 


Theorem 3.6.1 Let [a, b] be an interval, and for each integer n > 1, let f™: [a, b] 
— R be a Riemann-integrable function. Suppose f” converges uniformly on [a, b] 
to a function f : [a,b] > R. Then f is also Riemann integrable, and 


Proof We first show that f is Riemann integrable on [a, b]. This is the same as 
showing that the upper and lower Riemann integrals of f match: f am F=f [a,b] f. 

Let e > 0. Since f converges uniformly to f, we see that there exists an N > 0 
such that | f(x) — f(x)| < e for alln > N and x € [a, Db]. In particular we have 


f(x) —e < fx) < FM) +e 


for all x € [a, b]. Integrating this on [a, b] we obtain 


/ fm—eys f ref re] (f™ +8). 
2 fa.) J ta.) [a,b] [a,b] 


Since f is assumed to be Riemann integrable, we thus see 


[rr)-ee-o<f ref re[ [ro] +e0-0. 
Y [a,b] [a,b] 


a,b] a,b] 


In particular, we see that 


o<|/ f-f f <26(b—a). 
[a,b] Y [a,b] 
Since this is true for every ¢ > 0, we obtain s 


afl — es pt as desired. 
The above argument also shows that for every € > O there exists an N > O such 
that 
[ 10- f t]se0-a 


a,b] [a,b] 
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for all n > N. Since e is arbitrary, we see that f,,,, f” converges to J. f as 
desired. 


To rephrase Theorem 3.6.1: we can rearrange limits and integrals (on compact 
intervals [a, b]), 


lim | f= y lim f™, 
n—->0oo n> oo 
[a,b] [a,b] 
provided that the convergence is uniform. This should be contrasted with Examples 


1.2.9 and 3.2.5. 
There is an analogue of this theorem for series: 


Corollary 3.6.2 Let [a,b] be an interval, and let (f™), be a sequence of 
Riemann-integrable functions on [a, b] such that the series )-°_, f is uniformly 
convergent. Then we have 


(oe) 


yf r= i yf. 


n=lia.5] [a,b] "=! 
Proof See Exercise 3.6.1. oO 


This corollary works particularly well in conjunction with the Weierstrass M-test 
(Theorem 3.5.7): 


Example 3.6.3 (Informal) From Lemma 7.3.3 of Analysis I we have the geometric 
series identity 


for x € (—1, 1), and the convergence is uniform (by the Weierstrass M-test) on 
[—r,r] for any 0 <r < 1. By adding | to both sides we obtain 


and the converge is again uniform. We can thus integrate on [0, 7] and use Corollary 
3.6.2 to obtain 


(oe) 


yf vara f ax. 


n=010 r] (0,r] 


The left-hand side is °°) r”*!/(n + 1). If we accept for now the use of logarithms 
(we will justify this use in Sect. 4.5), the anti-derivative of 1/(1 — x) is —log( — x), 
and so the right-hand side is — log(1 — r). We thus obtain the formula 
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oe) 
—logd —r) = Sor" /(n +1) 
n=0 
forallO <r <1. 
— Exercises — 


Exercise 3.6.1 Use Theorem 3.6.1 to prove Corollary 3.6.2. 


3.7 Uniform Convergence and Derivatives 


We have already seen how uniform convergence interacts well with continuity, with 
limits, and with integrals. Now we investigate how it interacts with derivatives. 

The first question we can ask is: if f, converges uniformly to f, and the functions 
Jn are differentiable, does this imply that f is also differentiable? And does f/ also 
converge to f’? 

The answer to the second question is, unfortunately, no. To see a counterexample, 
we will use without proof some basic facts about trigonometric functions (which we 
will make rigorous in Sect.4.7). Consider the functions f,: [0,27] — R defined 
by f(x) :=n7!/? sin(nx), and let f: [0,277] — R be the zero function f(x) :=0. 
Then, since sin takes values between -1 and 1, we have dx(fn, f) < n—'/2, where 
we use the uniform metric doo (f, 2) = SUP, <[0,22] | f() — g(x)| introduced in Def- 
inition 3.4.2. Since n—!/* converges to 0, we thus see by the squeeze test that f, 
converges uniformly to f. On the other hand, f(x) = n'/? cos(nx), and so in par- 
ticular | f’(0) — f’(0)| = n'/?. Thus f/ does not converge pointwise to f’, and so in 
particular does not converge uniformly either. In particular we have 


Fie Fe Alt G8, 
n>oo dx 


dx noo 


The answer to the first question is also no. An example is the sequence of functions 


Fn: t-1, 1] — R defined by f, (x) :=,/ + + x2. These functions are differentiable 
(why?). Also, one can easily check that 


1 
Ix| < fn(x) < Izl+ 


for all x € [—1, 1] (why? square both sides), and so by the squeeze test f, converges 
uniformly to the absolute value function f(x) := |x|. But this function is not differ- 
entiable at 0 (why?). Thus, the uniform limit of differentiable functions need not be 
differentiable. (See also Example 1.2.10.) 

So, in summary, uniform convergence of the functions f, says nothing about 
the convergence of the derivatives f’. However, the converse is true, as long as f;, 
converges at at least one point: 
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Theorem 3.7.1 Let (a, b] be an interval, and for every integer n > 1, let f,: (a, b] 
— R be a differentiable function whose derivative f/: [a,b] > R is continuous. 
Suppose that the derivatives f' converge uniformly to a function g: [a,b] > R. 
Suppose also that there exists a point xo such that the limit limMn-+oo fn(xo) exists. 
Then the functions f, converge uniformly to a differentiable function f, and the 
derivative of f equals g. 


Informally, the above theorem says that if f’ converges uniformly, and f,,(xo) 
converges for some xo, then f,, also converges uniformly, and 4 limy+oo fnr(x) = 
LiMn o0 £ fF, (x). 

Proof We only give the beginning of the proof here; the remainder of the proof will 
be an exercise (Exercise 3.7.1). 


Since f, is continuous, we see from the fundamental theorem of calculus (Theo- 
rem 11.9.4) that 


fn(X) — fn(X0) = / ti 


[x0,.x] 


when x € [Xo, b], and 


fn(x) — fn(%0) = — i i 


[x,xo] 
when x € [a, xo]. 
Let L be the limit of f,,(xo) asm — ov: 


L:= lim Fn(Xo). 
n->0o 


By hypothesis, L exists. Now, since each f/ is continuous on [a, b], and f” converges 
uniformly to g, we see by Corollary 3.3.2 that g is also continuous. Now define the 
function f: [a, b] — R by setting 


f@):=L— / e+ fos 
[a,xo] [a,x] 


for all x € [a, b]. To finish the proof, we have to show that f,, converges uniformly 
to f, and that f is differentiable with derivative g; this shall be done in Exercise 
Stl, Oo 


Remark 3.7.2 It turns out that Theorem 3.7.1 is still true when the functions f/ are 
not assumed to be continuous, but the proof is more difficult; see Exercise 3.7.2. 


By combining this theorem with the Weierstrass M-test, we obtain 
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Corollary 3.7.3 Let [a, b] be an interval, and for every integern > 1, let fn: [a, b] 
—> R be a differentiable function whose derivative f’: [a,b] > R is continuous. 
Suppose that the series Sar Il F/ lloo is absolutely convergent, where 


IIfrlloo:= sup | f,(x)| 


xé[a,b] 


is the sup norm of f’, as defined in Definition 3.5.5. Suppose also that the series 
pee Jn(Xo) is convergent for some xo € [a, b]. Then the series Ppa, Jn converges 
uniformly on [a, b] to a differentiable function, and in fact 


d— id 
= 2 fu) =D) fa) 


n=1 
for all x € [a, b}. 
Proof See Exercise 3.7.3. oO 


We now pause to give an example of a function which is continuous everywhere, 
but differentiable nowhere (this particular example was discovered by Weierstrass). 
Again, we will presume knowledge of the trigonometric functions, which will be 
covered rigorously in Sect. 4.7. 


Example 3.7.4 Let f: R — R be the function 


[oe] 


f(x):= 954 cos(32" rx). 


n=1 


Note that this series is uniformly convergent, thanks to the Weierstrass M-test, and 
since each individual function 4~” cos(32”zx) is continuous, the function f is also 
continuous. However, it is not differentiable (Exercise 4.7.10); in fact it is a nowhere 
differentiable function, one which is not differentiable at any point, despite being 
continuous everywhere! 


— Exercises — 


Exercise 3.7.1 Complete the proof of Theorem 3.7.1. Compare this theorem with 
Example 1.2.10, and explain why this example does not contradict the theorem. 


Exercise 3.7.2 Prove Theorem 3.7.1 without assuming that f/ is continuous. (This 
means that you cannot use the fundamental theorem of calculus. However, the 
mean value theorem (Corollary 10.2.9) is still available. Use this to show that 
if doo( fi, fi.) < €, then |( f(x) — fin) — (fulxo) — fn(X0))| < €lx — xo] for all 
x € [a, b], and then use this to complete the proof of Theorem 3.7.1.) 


Exercise 3.7.3 Prove Corollary 3.7.3. 
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3.8 Uniform Approximation by Polynomials 


As we have just seen, continuous functions can be very badly behaved, for instance 
they can be nowhere differentiable (Example 3.7.4). On the other hand, functions 
such as polynomials are always very well behaved, in particular being always dif- 
ferentiable. Fortunately, while most continuous functions are not as well behaved 
as polynomials, they can always be uniformly approximated by polynomials; this 
important (but difficult) result is known as the Weierstrass approximation theorem, 
and is the subject of this section. 


Definition 3.8.1 Let [a,b] be an interval. A polynomial on [a, b] is a function 
f: [a,b] > R of the form f(x) := ee c;x/, where n > 0 is an integer and 
Co, +++», Cn are real numbers. If c, 4 0, then 7 is called the degree of f. 


Example 3.8.2 The function f: [1,2] — R defined by f(x) := 3x4 + 2x3 — 4x + 
5 is a polynomial on [1, 2] of degree 4. 


Theorem 3.8.3 (Weierstrass approximation theorem) Jf [a,b] is an interval, 
f: [a, b] > R is a continuous function, and ¢ > 0, then there exists a polynomial 
P on[a, b] such that d,.(P, f) < € (i.e. |P(x) — f(x)| < € forall x € [a, b]). 


Another way of stating this theorem is as follows. Recall that C([a, b] > R) 
was the space of continuous functions from [a, b] to R, with the uniform metric 
dx. Let P([a, b] > R) be the space of all polynomials on [a, b]; this is a sub- 
space of C([a, b] — R), since all polynomials are continuous (Exercise 9.4.7). The 
Weierstrass approximation theorem then asserts that every continuous function is an 
adherent point of P([a, b] — R); or in other words, that the closure of the space of 
polynomials is the space of continuous functions: 


P([a, b] > R) = C([a, b] > R). 


In particular, every continuous function on [a, b] is the uniform limit of polynomials. 
Another way of saying this is that the space of polynomials is dense in the space of 
continuous functions, in the uniform topology. 

The proof of the Weierstrass approximation theorem is somewhat complicated and 
will be done in stages. We first need the notion of an approximation to the identity. 


Definition 3.8.4 (Compactly supported functions) Let [a, b] be an interval. A func- 
tion f: R — R is said to be supported on [a, b] if f(x) = 0 for all x ¢ [a, b]. We 
say that f is compactly supported if it is supported on some interval [a, b]. If f 
is continuous and supported on [a, b], we define the improper integral ; io f to be 


ee f= haw f. 


Note that a function can be supported on more than one interval, for instance a 
function which is supported on [3, 4] is also automatically supported on [2, 5] (why?). 
In principle, this might mean that our definition of f is not well defined, however 
this is not the case: 
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Lemma 3.8.5 Jf f : R — R is continuous and supported on an interval [a, b], and 
is also supported on another interval [c, d], then Se bl f= the dl f. 


Proof See Exercise 3.8.1. oO 


Definition 3.8.6 Approximation to the identity) Let ¢ > 0 and 0 < 6 < 1. A func- 
tion f: R — R is said to be an (¢, 5)-approximation to the identity if it obeys the 
following three properties: 


(a) f is supported on [—1, 1], and f(x) > Oforall—l <x <1. 
(b) f is continuous, and [™ f = 1. 
(c) | f(x)| < e forall 6 < |x| < 1. 


Remark 3.8.7 For those of you who are familiar with the Dirac delta function, 
approximations to the identity are ways to approximate this (very discontinuous) 
delta function by a continuous function (which is easier to analyze). We will not 
however discuss the Dirac delta function in this text. 


Our proof of the Weierstrass approximation theorem relies on three key facts. The 
first fact is that polynomials can be approximations to the identity: 


Lemma 3.8.8 (Polynomials can approximate the identity) For every e > Oand0O < 
5 < | there exists an (€, 5)-approximation to the identity which is a polynomial P 
on [—1, 1]. 


Proof See Exercise 3.8.2. oO 


We will use these polynomial approximations to the identity to approximate con- 
tinuous functions by polynomials. We will need the following important notion of a 
convolution. 


Definition 3.8.9 (Convolution) Let f: R > Rand g: R — Rbe continuous, com- 
pactly supported functions. We define the convolution f * g: R — Rof f and g to 
be the function 


(f * g)(x) i= / fg — y) dy. 


Note that if f and g are continuous and compactly supported, then for each x 
the function f(y)g(x — y) (thought of as a function of y) is also continuous and 
compactly supported, so the above definition makes sense. 


Remark 3.8.10 Convolutions play an important réle in Fourier analysis and in partial 
differential equations (PDE), and are also important in physics, engineering, and 
signal processing. An in-depth study of convolution is beyond the scope of this text; 
only a brief treatment will be given here. 


Proposition 3.8.11 (Basic properties of convolution) Let f: R> R, g: R- R, 
and h: R —> R be continuous, compactly supported functions. Then the following 
Statements are true. 
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(a) The convolution f * g is also a continuous, compactly supported function. 

(b) (Convolution is commutative) We have f * g = g x f. 

(c) (Convolution is linear) We have f * (g +h) = f * g+ f *h. Also, for any real 
number c, we have f * (cg) = (cf) *g =c(f * g). 


Proof See Exercise 3.8.4. oO 


Remark 3.8.12 There are many other important properties of convolution, for 
instance it is associative, (f *« g)*h = f * (g *h), and it commutes with deriva- 
tives, (f * g)’ = f’ * g = f * g’, when f and g are differentiable. The Dirac delta 
function 6 mentioned earlier is an identity for convolution: f * 5 = 6 * f = f.These 
results are slightly harder to prove than the ones in Proposition 3.8.11, however, and 
we will not need them in this text. 


As mentioned earlier, the proof of the Weierstrass approximation theorem relies 
on three facts. The second key fact is that convolution with polynomials produces 
another polynomial: 


Lemma 3.8.13 Let f: R — R be a continuous function supported on [0, 1], and 
let g: R > R be acontinuous function supported on [—1, 1] which is a polynomial 
on [—1, 1]. Then f * g is a polynomial on [0, 1]. (Note however that it may be 
non-polynomial outside of (0, 1].) 


Proof Since g is polynomial on [—1, 1], we may find an integer n > 0 and real 
numbers co, C1, ..., Cn Such that 


n 


g(x) =) ¢;x/ for all x € [—1, 1]. 
j=0 


On the other hand, for all x € [0, 1], we have 


f*gx= / fae —y) dy = 1 f(yg(x — y) dy 
—0o [0,1] 


since f is supported on [0, 1]. Since x € [0, 1] and the variable of integration y is 
also in [0, 1], we have x — y € [—1, 1]. Thus we may substitute in our formula for 
g to obtain 


feaiy= f £0) Deir ay. 


0,1] j=0 


We expand this using the binomial formula (Exercise 7.1.4) to obtain 


n of, ) 
Jj: k i—k 
fea= f £0) cj )) — x" (-y)™ dy. 
a i LEG BI 


[0,1] J=0 
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We can interchange the two summations (by Corollary 7.1.14) to obtain 


if j! 
peae= f ro Ye “ikiG—b! rt y)i- * dy 


[0.1] k=0 j=k 


(why did the limits of summation change? It may help to plot j and k ona graph). Now 
we interchange the k summation with the integral, and observe that x* is independent 
of y, to obtain 


feawa dx f 10 Leggy (=y)# dy, 


k)! 
k=0 10,1] 


If we thus define 


a= [ 10) Legg am ayy tay 


(0,1) 


for each k = 0,..., 1, then C; is a number which is independent of x, and we have 


fey = ye 


for all x € [0, 1]. Thus f * g is a polynomial on [0, 1]. Oo 


The third key fact is that if one convolves a uniformly continuous function with an 
approximation to the identity, we obtain a new function which is close to the original 
function (which explains the terminology “approximation to the identity’’): 


Lemma 3.8.14 Let f : R — R be a continuous function supported on [0, 1], which 
is bounded by some M > 0 (i.e., | f (x)| < M forall x € R), and let e > Oand0 < 
5 < 1 be such that one has | f (x) — f(y)| < € whenever x, y € Rand |x — y| <6. 
Let g be any (€, 5)-approximation to the identity. Then we have 


Ife ga) —f@)|<d+4Me 
for all x € [0, 1). 
Proof See Exercise 3.8.6. Oo 


Combining these together, we obtain a preliminary version of the Weierstrass 
approximation theorem: 


Corollary 3.8.15 (Weierstrass approximation theorem I) Let f: R — R be a con- 
tinuous function supported on [0, 1]. Then for every € > 0, there exists a function 
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P:R-—- R which is polynomial on [0, 1] and such that | P(x) — f(x)| < € forall 
x € [0, 1]. 


Proof See Exercise 3.8.7. oO 


Now we perform a series of modifications to convert Corollary 3.8.15 into the 
actual Weierstrass approximation theorem. We first need a simple lemma. 


Lemma 3.8.16 Let f: [0,1] — R be a continuous function which equals 0 on the 
boundary of [0, 1], i.e, f(0) = fC) = 0. Let F: R > R be the function defined 
by setting F(x) := f(x) for x € [0, 1] and F(x) :=0 for x ¢ [0, 1]. Then F is also 
continuous. 


Proof See Exercise 3.8.9. Oo 


Remark 3.8.17 The function F obtained in Lemma 3.8.16 is sometimes known as 
the extension of f by zero. 


From Corollary 3.8.15 and Lemma 3.8.16 we immediately obtain 


Corollary 3.8.18 (Weierstrass approximation theorem II) Let f: [0,1] > R be a 
continuous function such that f (0) = f (1) = 0. Then for every € > 0 there exists a 
polynomial P : [0,1] — R such that |P(x) — f(x)| < € forall x € [0, 1]. 


Now we strengthen Corollary 3.8.18 by removing the assumption that f(0) = 
fd) =0. 


Corollary 3.8.19 (Weierstrass approximation theorem III) Let f: [0,1] ~ R bea 
continuous function. Then for every ¢ > O there exists a polynomial P : [0,1] > R 
such that |P(x) — f(x)| < € forall x € [0, 1]. 


Proof Let F: [0, 1] ~ R denote the function 
F(x) := f(x) — f(0) —x(f() — f()). 
Observe that F is also continuous (why?), and that F(0) = F(1) = 0. By Corollary 


3.8.18, we can thus find a polynomial Q : [0, 1] > R such that |Q(x) — F(x)| <e 
for all x € [0, 1]. But 


O(x) — F(x) = QO) + FO) + x(fFQ) — FO) — fF), 


so the claim follows by setting P to be the polynomial P(x) := Q(x) + f(0) : 
x(f(1) — f)). 


Finally, we can prove the full Weierstrass approximation theorem. 
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Proof of Theorem 3.8.3 Let f : [a,b] — R be acontinuous function on [a, b]. Let 
g: [0, 1] — R denote the function 


g(x) := f(a t+ (b —a)x) for all x € [0, 1] 
Observe then that 
f(y) = g((y — a)/(6 — a)) for all y € [a, b]. 
The function g is continuous on [0, 1] (why?), and so by Corollary 3.8.19 we may 


find a polynomial Q : [0, 1] — R such that | Q(x) — g(x)| < ¢ forall x € [0, 1]. In 
particular, for any y € [a, b], we have 


|O((y — a)/(b — a)) — g((y — a)/(6—@))| SE. 


If we thus set P(y) := Q((y — a)/(b — a)), then we observe that P is also a poly- 
nomial (why?), and so we have |P(y) — f(y)| < ¢ forall y € [a, b], as desired. 


Remark 3.8.20 Note that the Weierstrass approximation theorem only works on 
bounded intervals [a, b]; continuous functions on R cannot be uniformly approx- 
imated by polynomials. For instance, the exponential function f: R — R defined 
by f (x) :=e* (which we shall study rigorously in Sect. 4.5) cannot be approximated 
by any polynomial, because exponential functions grow faster than any polynomial 
(Exercise 4.5.9) and so there is no way one can even make the sup metric between 
f and a polynomial finite. 


Remark 3.8.21 There is a generalization of the Weierstrass approximation theorem 
to higher dimensions: if K is any compact subset of R” (with the Euclidean metric 
dz), and f: K — R is a continuous function, then for every ¢ > 0 there exists 
a polynomial P : K — R of n variables x;,...,x, such that do(f, P) < e. This 
general theorem can be proven by a more complicated variant of the arguments here, 
but we will not do so. (There is in fact an even more general version of this theorem 
applicable to an arbitrary metric space, known as the Stone- Weierstrass theorem, but 
this is beyond the scope of this text.) 


— Exercises — 
Exercise 3.8.1 Prove Lemma 3.8.5. 


Exercise 3.8.2 (a) Prove that for any real number 0 < y < 1 and any natural num- 
bern > 0, that (1 — y)” > 1 — ny. (Hint: induct on n. Alternatively, differentiate 
with respect to y.) 

(b) Show that fd —x°)"dx > Ti (Hint: for |x| < 1/./n, use part (a); for |x| > 
1/./n, just use the fact that (1 — x7) is positive. It is also possible to proceed 
via trigonometric substitution, but I would not recommend this unless you know 
what you are doing.) 
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(c) Prove Lemma 3.8.8. (Hint: choose f (x) to equal c(1 — x’) forx € [—1, 1] and 
to equal zero for x ¢ [—1, 1], where N is a large number N, where c is chosen 
so that f has integral 1, and use (b).) 


Exercise 3.8.3 Let f: R — R be a compactly supported, continuous function. 
Show that f is bounded and uniformly continuous. (Hint: the idea is to use Proposi- 
tion 2.3.2 and Theorem 2.3.5, but one must first deal with the issue that the domain 
R of f is non-compact.) 


Exercise 3.8.4 Prove Proposition 3.8.11. (Hint: to show that f * g is continuous, 
use Exercise 3.8.3.) 


Exercise 3.8.5 Let f: R > Randg: R — R be continuous, compactly supported 
functions. Suppose that f is supported on the interval [0, 1], and g is constant on 
the interval [0, 2] (i.e., there is a real number c such that g(x) = c for all x € [0, 2]). 
Show that the convolution f * g is constant on the interval [1, 2]. 


Exercise 3.8.6 (a) Let g be an (e, 5) approximation to the identity. Show that 1 — 


2e < je pyge hk 
(b) Prove ae 3.8.14. (Hint: begin with the identity 


fea) = f feo) dy = / f(x — y)g(y) dy 


[-46,6] 


+ i f(x — y)g(y) dy + / f(x — y)g(y) dy. 
[5,1] [-L,=<] 


The idea is to show that the first integral is close to f(x), and that the second 
and third integrals are very small. To achieve the former task, use (a) and the fact 
that f(x) and f(x — y) are within e of each other; to achieve the latter task, use 
property (c) of the approximation to the identity and the fact that f is bounded.) 


Exercise 3.8.7 Prove Corollary 3.8.15. (Hint: combine Exercise 3.8.3 and Lemmas 
3.8.8, 3.8.13, 3.8.14.) 


Exercise 3.8.8 Let f: [0,1] —> R be a continuous function, and suppose that 
Sion f(x)x" dx =0 for all non-negative integers n = 0,1,2,.... Show that f 
must be the zero function f = 0. (Hint: first show that tio. f(x) P(x) dx = 0 for 
all polynomials P. Then, using the Weierstrass approximation theorem, show that 


Soo fF) dx = 0.) 
Exercise 3.8.9 Prove Lemma 3.8.16. 


Chapter 4 ®) 
Power Series Cheak for 


4.1 Formal Power Series 


We now discuss an important subclass of series of functions, that of power series. As 
in earlier chapters, we begin by introducing the notion of a formal power series and 
then focus in later sections on when the series converges to a meaningful function 
and what one can say about the function obtained in this manner. 


Definition 4.1.1 (Formal power series) Let a be a real number. A formal power 
series centered at a is any series of the form 


Cn (x a)" 


n=0 


where co, C1, ... is a Sequence of real numbers (not depending on x); we refer to c, 
as the n‘” coefficient of this series. Note that each term c, (x — a)” in this series is a 
function of a real variable x. 


Example 4.1.2. The series ar n!(x — 2)” is a formal power series centered at 2. 
The series )*~~, 2*(x — 3)” is not a formal power series, since the coefficients 2* 
depend on x. 


We call these power series formal because we do not yet assume that these series 
converge for any x. However, these series are automatically guaranteed to converge 
when x = a (why?). In general, the closer x gets to a, the easier it is for this series 
to converge. To make this more precise, we need the following definition. 


Definition 4.1.3. (Radius of convergence) Let yy Cn(x — a)” be a formal power 
series. We define the radius of convergence R of this series to be the quantity 


1 


“dim sup, 55 [¢n|!/” 
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: ae a 
where we adopt the convention that 5 = +00 and -> = 0. 


Remark 4.1.4 Each number |c,|!/” is non-negative, so the limit lim sup,,_, ., [¢n|!/” 
can take on any value from 0 to +00, inclusive. Thus R can also take on any value 
between 0 and +00 inclusive (in particular it is not necessarily a real number). Note 
that the radius of convergence always exists, even if the sequence |c,|'/” is not 
convergent, because the lim sup of any sequence always exists (though it might be 
+00 or —00). 


Example 4.1.5 The series pa n(—2)" (x — 3)” has radius of convergence 


1 1 1 


lim sup, ,.. |n(—2)"|/"  limsup, ,.,2n!/" 2” 
The series ere Qn (x + 2)” has radius of convergence 


1 1 1 


: 5 =-rT = = 0. 
lim sup,,.50 12” |!" Tim sup,,_,4,2” = +00 
The series yy 2 (x + 2)” has radius of convergence 
1 1 1 
= = 7-=+0. 


lim sup, ,.,|2-"|!/" limsup, ,,,2 0 
The significance of the radius of convergence is the following. 


Theorem 4.1.6 Let ~~.) c,(x — a)" be a formal power series, and let R be its 
radius of convergence. 


(a) (Divergence outside of the radius of convergence) Ifx € Ris such that |x — a| > 
R, then the series )~>° . Cn(x — a)" is divergent for that value of x. 

(b) (Convergence inside the radius of convergence) If x € R is such that |x — a| < 
R, then the series ¥~° 9 Cn(x — a)" is absolutely convergent for that value of x. 


For parts (c)-(e) we assume that R > 0 (i.e., the series converges at at least 
one other point than x = a). Let f: (a— R,a+ R) > R be the function f (x) := 
b Sas Cn(x — a)"; this function is guaranteed to exist by (b). 


(c) (Uniform convergence on compact sets) For any 0 <r < R, the series Yo Cn 
(x — a)" converges uniformly to f on the compact interval [a —r,a-+r]. In 
particular, f is continuous on (a — R,a+ R). 

(d) (Differentiation of power series) The function f is differentiable on (a — R,a+ 
R), and for any0 <r < R, the series ~~, nen(x — a)"~' converges uniformly 
to f' on the interval [a —r,a +r]. 

(e) (Integration of power series) For any closed interval [y, z] contained in (a — 
R,a+R), we have 


4.2 Real Analytic Functions 67 


~ (z a)"t} (y a)'t! 
n+1 , 


[y.z] 


Proof See Exercise 4.1.1. 


Theorem 4.1.6 (a) and (b) of the above theorem give another way to find the radius 
of convergence, by using your favorite convergence test to work out the range of x 
for which the power series converges: 


Example 4.1.7 Consider the power series )°~ . n(x — 1)”. The ratio test shows that 
this series converges when |x — 1| < 1 and diverges when |x — 1| > 1 (why?). Thus 
the only possible value for the radius of convergence is R = | (if R < 1, then we 
have contradicted Theorem 4.1.6(a); if R > 1, then we have contradicted Theorem 
4.1.6(b)). 


Remark 4.1.8 Theorem 4.1.6 is silent on what happens when |x — a| = R, i.e., at 
the points a — R anda + R. Indeed, one can have either convergence or divergence 
at those points; see Exercise 4.1.2. 


Remark 4.1.9 Note that while Theorem 4.1.6(b) assures us that the power series 
aan Cy(x — a)" will converge pointwise on the interval (a — R,a+ R), it need 
not converge uniformly on that interval (see Exercise 4.1.2(e)). On the other hand, 
Theorem 4.1.6(c) assures us that the power series will converge on any smaller 
interval [a — r,a-+r]. In particular, being uniformly convergent on every closed 
subinterval of (a — R, a + R) is not enough to guarantee being uniformly convergent 
on all of (a— R,a+R). 


—Exercise— 


Exercise 4.1.1 Prove Theorem 4.1.6. (Hints: for (a) and (b), use the root test (Theo- 
rem 7.5.1). For (c), use the Weierstrass M-test (Theorem 3.5.7). For (d), use Theorem 
3.7.1. For (e), use Corollary 3.6.2.) 


Exercise 4.1.2 Give examples of a formal power series }°°° 9 c,x” centered at 0 
with radius of convergence 1, which 


(a) diverges at both x = | andx = —1; 

(b) diverges at x = 1 but converges at x = —1; 
(c) converges at x = | but diverges at x = —1; 
(d) converges at both x = | and x = —1. 


(e) converges pointwise on (—1, 1), but does not converge uniformly on (—1, 1). 


4.2 Real Analytic Functions 


A function f(x) which is lucky enough to be representable as a power series has a 
special name; it is a real analytic function. 
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Definition 4.2.1 (Real analytic functions) Let E beasubset of R, andlet f: E > R 
be a function. If a is an interior point of E, we say that f is real analytic at a if 
there exists an open interval (a — r,a +r) in E forsomer > 0 such that there exists 
a power series )~°° 4 cn(x — a)” centered at a which has a radius of convergence 
greater than or equal to r and which converges to f on (a —r,a +r). If F isan open 
set, and f is real analytic at every point a of E, we say that f is real analytic on E. 


Example 4.2.2. Consider the function f: R\{1} > R defined by f(x) := 1/( — 
x). This function is real analytic at 0 because we have a power series )°°° 9.x” 
centered at 0 which converges to 1/(1 — x) = f(x) on the interval (—1, 1). This 
function is also real analytic at 2 because we have a power series yo (- tla - 
2)” which converges to aS -_ — = f(x) on the interval (1, 3) (why? Use 
Lemma 7.3.3). In fact this function is real analytic on all of R\ {1}; see Exercise 4.2.2. 


Remark 4.2.3. The notion of being real analytic is closely related to another notion, 
that of being complex analytic, but this is a topic for complex analysis, and will not 
be discussed here. 


We now discuss which functions are real analytic. From Theorem 4.1.6(c) and 
(d) we see that if f is real analytic at a point a, then f is both continuous and 
differentiable on (a —r,a + r) for some r > 0. We can in fact say more: 


Definition 4.2.4 (k-times differentiability) Let E be a subset of R with the property 
that every element of E is a limit point of FE. We say a function f: E — R is 
once differentiable on E iff it is differentiable (so in particular f’: E — R is also 
a function on E’. More generally, for any k > 2 we say that f: E > Ris k times 
differentiable on E, or just k times differentiable, iff f is differentiable, and f’ is 
k — 1 times differentiable. If f is k times differentiable, we define the k’ * derivative 
f®: E > Rbytherecursive rule f := f’,and f® := (f“—) forallk > 2. We 
also define f© := f (this is f differentiated 0 times), and we allow every function 
to be zero times differentiable (since clearly f exists for every f). A function is 
said to be infinitely differentiable (or smooth) iff it is k times differentiable for every 
k>0. 


Example 4.2.5 The function f (x) := |x|? is twice differentiable on R, but not three 
times differentiable (why?). Indeed, f® = f” = 6|x|, which is not differentiable, 
at 0. 


Proposition 4.2.6 (Real analytic functions are k-times differentiable) Let E be a 
subset of R, let a be an interior point of E, and and let f be a function which is real 
analytic at a, thus there is anr > O for which we have the power series expansion 


loo) 


fx) = So ene — a)" 


n=0 


for allx € (a—r,a-+r). Then for every k > 0, the function f is k-times differen- 
tiable on (a —r,a +r), and for each k > 0 the k'" derivative is given by 
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fC) =P enuln + Din +2)... 2+ Dexa)" 


n=0 
oe) 
+k)! 
= Cn+k Ma (x = a)" 
n=0 nt 


forallx €(a-—r,at+r). 


Proof See Exercise 4.2.3. 


Corollary 4.2.7 (Real analytic functions are infinitely differentiable) Let E be an 
open subset of R, and let f: E — R be a real analytic function on E. Then f is 
infinitely differentiable on E. Also, all derivatives of f are also real analytic on E. 


Proof For every point a € E and k > 0, we know from Proposition 4.2.6 that f 
is k-times differentiable at a (we will have to apply Exercise 10.1.1 k times here, 
why?). Thus f is k-times differentiable on E for every k > 0 and is hence infinitely 
differentiable. Also, from Proposition 4.2.6 we see that each derivative f of f has 
a convergent power series expansion at every x € E and thus f™ is real analytic. 


Example 4.2.8 Consider the function f: R — R defined by f(x) := |x|. This func- 
tion is not differentiable at x = 0 and hence cannot be real analytic at x = 0. It is 
however real analytic at every other point x € R\{0} (why?). 


Remark 4.2.9 The converse statement to Corollary 4.2.7 is not true; there are 
infinitely differentiable functions which are not real analytic. See Exercise 4.5.4. 


Proposition 4.2.6 has an important corollary, due to Brook Taylor (1685-1731). 


Corollary 4.2.10 (Taylor’s formula) Let E be a subset of R, let a be an interior 
point of E, and let f : E — R be a function which is real analytic at a and has the 
power Series expansion 


fx) = Yo en(x — a)" 


for allx € (a—r,a+r) and some r > 0. Then for any integer k > 0, we have 
fO@) = Ke, 


where k! := 1 x 2.x ... x k (and we adopt the convention that 0! = 1). Inparticular, 
we have Taylor’s formula 


oe (n) 
feo=yt Oi, —a)" 


n=0 


forallx in(a-—rat+r). 
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Proof See Exercise 4.2.4. 


- (n) . ‘ * 
The power series ae £ aa (x — a)” is sometimes called the Taylor series of 
f around a. Taylor’s formula thus asserts that if a function is real analytic, then it is 


equal to its Taylor series. 


Remark 4.2.11 Note that Taylor’s formula only works for functions which are real 
analytic; there are examples of functions which are infinitely differentiable but for 
which Taylor’s theorem fails (see Exercise 4.5.4). 


Another important corollary of Taylor’s formula is that a real analytic function 
can have at most one power series at a point: 


Corollary 4.2.12 (Uniqueness of power series) Let E be a subset of R, let a be an 
interior point of E, and let f: E — R be a function which is real analytic at a. 
Suppose that f has two power series expansions 


(oe) 


f(x) = )o n(x - a)" 


n=0 


and 


fas yd (x — a)" 


n=0 


centered at a, each with a nonzero radius of convergence. Then c, = d, foralln > 0. 


Proof By Corollary 4.2.10, we have f(a) = k!c, for all k > 0. But we also have 
f®@ =k!dy, by similar reasoning. Since k! is never zero, we can cancel it and 
obtain c, = d, for all k > 0, as desired. 


Remark 4.2.13 While a real analytic function has a unique power series around 
any given point, it can ale pave different power series at different points. For 
instance, the function f(x) := —, defined on R — {1}, has the power series 


CO 
foc x 
n=0 
around 0, on the interval (—1, 1), but also has the power series 


2 
1—2@-3) 


“EGS eed) 


n 


fa) = 


around 1/2, on the interval (0, 1) (note that the above power series has a radius of 
convergence of 1/2, thanks to the root test; see also Exercise 4.2.8). 
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—Exercise— 


Exercise 4.2.1 Let n > 0 be an integer, let c,a be real numbers, and let f be 
the function f(x) := c(x — a)”. Show that f is infinitely differentiable, and that 
f©@ = oan —a)"~ for all integers 0 < k < n. What happens when k > n? 


Exercise 4.2.2 Show that the function f defined in Example 4.2.2 is real analytic 
on all of R\{1}. 


Exercise 4.2.3. Prove Proposition 4.2.6. (Hint: induct on k and use Theorem 4. 1.6(d).) 
Exercise 4.2.4 Use Proposition 4.2.6 and Exercise 4.2.1 to prove Corollary 4.2.10. 


Exercise 4.2.5 Let a,b be real numbers, and let n > 0 be an integer. Prove the 
identity 


n! 

a, n — b _ na—-m Ray, b m 
(x —a) Do ala — my mio 
for any real number x. (Hint: use the binomial formula, Exercise 7.1.4.) Explain why 
this identity is consistent with Taylor’s theorem and Exercise 4.2.1. (Note however 
that Taylor’s theorem cannot be rigorously applied until one verifies Exercise 4.2.6 
below.) 


Exercise 4.2.6 Using Exercise 4.2.5, show that every polynomial P(x) of one vari- 
able is real analytic on R. 


Exercise 4.2.7 Let m > 0 be a positive integer, and let 0 < x < r be real numbers. 
Use Lemma 7.3.3 to establish the identity 


~ 


for all x € (—r, r). Using Proposition 4.2.6, conclude the identity 


(oe) 


r n! 
A-M -R 
————__ = ) ery 
(r —x)mtl m\(n — m)! 


for all integers m > 0 and x € (—r, rr). Also explain why the series on the right-hand 
side is absolutely convergent. 
Exercise 4.2.8 Let E be a subset of R, let a be an interior point of E, and let 


f: E — Rbea function which is real analytic at a and has a power series expansion 


(oe) 


fx) =) en(x — a)" 


n=0 


at a which converges on the interval (a — r,a +r). Let (b — 5,b +) be any subin- 
terval of (a —r,a+r) forsomes > 0. 
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(a) Prove that |a — b| <r —s, so in particular |a — b| <r. 

(b) Show that for every 0 < ¢ <r, there existsa C > 0 such that |c,| < C(r —¢)” 
for all integers n > 0. (Hint: what do we know about the radius of convergence 
of the series )>~ 9 cn(x — a)”?) 

(c) Show that the numbers do, d;,... given by the formula 


n! n—-m e 
din = pe mG” — a)" "Cy for all integers m > 0 
are well-defined, in the sense that the above series is absolutely convergent. 
(Hint: use (b) and the comparison test, Corollary 7.3.2, followed by Exercise 
4.2.7.) 
(d) Show that for every 0 < ¢ <5 there exists a C > 0 such that 


Idm| < C(s —e)™ 


for all integers m > 0. (Hint: use the comparison test, and Exercise 4.2.7.) 

(e) Show that the power series }°~” 9 dm(x — b)” is absolutely convergent for 
x € (b—s,b+5) and converges to f(x). (You may need Fubini’s theorem for 
infinite series, Theorem 8.2.2 of Analysis I, as well as Exercise 4.2.5. One may 
also need to study a variant of the d,, in which the c, are replaced by |c,|.) 

(f) Conclude that f is real analytic at every point in (a —r,a +r). 


4.3 Abel’s Theorem 


Let f(x) = an Cn(x — a)” be a power series centered at a with a radius of con- 
vergence 0 < R < ow strictly between 0 and infinity. From Theorem 4.1.6 we know 
that the power series converges absolutely whenever |x — a| < R and diverges when 
|x — a| > R. However, at the boundary |x — a| = R the situation is more compli- 
cated; the series may either converge or diverge (see Exercise 4.1.2). However, if the 
series does converge at the boundary point, then it is reasonably well-behaved; in 
particular, it is continuous at that boundary point. 


Theorem 4.3.1 (Abel’s theorem) Let f(x) = per Cy(x — a)” be a power series 
centered at a with radius of convergence 0 < R < oo. If the power series converges 
ata-+ R, then f is continuous ata + R, i.e., 


(oe) 


ioe) 
lim ) Cy(x — a)" = ) cy R". 
x—>a+R:xe(a—R,a+R) 
n=0 n=0 


Similarly, if the power series converges ata — R, then f is continuous ata — R, i.e., 
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ioe) lo) 
lim ) Ca(x — a)" = ) Cn(—R)". 
x—a—R:xE(a—R,a+R) 
n=0 n=0 


Before we prove Abel’s theorem, we need the following lemma. 


Lemma 4.3.2. (Summation by parts formula) Let (a,)°°.9 and (by, ) 7°.) be sequences 
of real numbers which converge to limits A and B, respectively, i.e., fimy- +00 =A 
and limy-+o0 b, = B. Suppose that the sum yale: 1 — Gy)by is convergent. Then 
the sum pane An+1(bn41 — bn) is also convergent, and 


(oe) 


CO 
Yi Ga41 — an)bn = AB = aobo — > anys (bay — bn)- 
n=0 n=0 


Proof See Exercise 4.3.1. 


Remark 4.3.3, One should compare this formula with the more well-known integra- 
tion by parts formula 


[ Foe ax = fongolg — f Fee’ dx, 
0 0 


see Proposition 11.10.1. 


Proof of Abel’s theorem It will suffice to prove the first claim, i.e., that 
lim Cn(x — a)" = C 
x—a+R:xe(a—R a+R)* > n( = = Yak’ , 
n 


whenever the sum ) °° c, R” converges; the second claim will then follow (why?) by 
replacing c, by (—1)"c, in the above claim. If we make the substitutions d, := c, R” 
and y := a then the above claim can be rewritten as 


lh, oa" -Y 


whenever the sum }*~ , d, converges. (Why is this equivalent to the previous claim?) 
Write D := are d,, and for every N > 0 write 


so in particular Sy = — D. Then observe that limy—.. Sy = 0, and that d, = S,41 — 
S,- Thus for any y € (—1, 1) we have 
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CO CO 
ode" => Ga Sy" 
n=0 n=0 


Applying the summation by parts formula (Lemma 4.3.2), and noting that lim,_, 45 y” 
= 0, we obtain 


fore) ioe) 
Say = —Soy® = a Sai — y"). 
n=0 n=0 


Observe that — So 4° = +D. Thus to finish the proof of Abel’s theorem, it will suffice 
to show that 


[oe] 


li S n+l _ yn —0. 
ede ll a nt iy y ) 
n= 
Since y converges to 1, we may as well restrict y to [0, 1) instead of (—1, 1); in 
particular we may take y to be positive. 
From the triangle inequality for series (Proposition 7.2.9), we have 


CO 
<= 1S"! — y)I 
n=0 


CO 
=>" [Sul —y¥""), 


n=0 


lo) 
> Su ~ y") 


n=0 


so by the squeeze test (Corollary 6.4.14) it suffices to show that 


[oe] 


lim > |Sneil(y" — y"*!) = 0. 


y—> l:ye[0,1 
yohyel 0 


The expression ~~? 4 |Sn4il(y" — y"*!) is clearly non-negative, so it will suffice to 
show that 


oe) 
limsup S°|SnsilQ" — y"*!) =0. 


y— l:ye[0,1) n=0 


Let ¢ > 0. Since S,, converges to 0, there exists an N such that |S,,| < ¢foralln > N. 
Thus we have 


oo N oo 
Sal = 2) Salo >) eo"): 
n=0 n=0 n=N+1 


The last summation is a telescoping series, which sums to ey’ +! (See Lemma 7.2.14, 
recalling from Lemma 6.5.2 that y” — 0 as n — oo), and thus 


4.4 Multiplication of Power Series 75 


N 


CO 
Yo Stil” — v4") < Do [SasilQ™ — yt?) + ey". 
n=0 n=0 


Now take limits as y —> 1. Observe that y” — y’t! > 0 as y > 1 for every n€ 
0,1,..., NM. Since we can interchange limits and finite sums (Exercise 7.1.5), we 
thus have 


[o,@) 
lim sup )> |SnuilQ” —y"*!) <e. 
y— l:ye[0,1) n=0 


But ¢ > 0 was arbitrary, and thus we must have 


oe) 
lim sup ~ Snril(y” — yt’) =0 


y> l:ye[0,1) 4=6 


since the left-hand side must be non-negative. The claim follows. 


—Exercise— 


Exercise 4.3.1 Prove Lemma 4.3.2. (Hint: first work out the relationship between 
the partial sums y7 9 (dn41 — dn)bn and > 9 dn41(bn41 — bn)-) 


4.4 Multiplication of Power Series 


We now show that the product of two real analytic functions is again real analytic. 


Theorem 4.4.1 Let f: (a—r,a+r)—> Rand g: (a—r,a+r) > R be func- 
tions analytic on (a — r,a +1), with power series expansions 


(oe) 


f@Q= > Gea" 


n=0 


and 


g(x) =) d(x - a)" 


n=0 


, respectively. Then fg: (a —r,a+r) — Ris also analytic on (a —r,a +r), with 


power series expansion 
[o.e) 


f(x)g(x) = > en(x — a)" 


n=0 


ie: n 
where €n := Y~ 9 Cmdn—m- 
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Remark 4.4.2 The sequence (e,)?°., is sometimes referred to as the convolution of 


the sequences (c,)°°., and (d,)r°.; it is closely related (though not identical) to the 


notion of convolution introduced in Definition 3.8.9. 


Proof We have to show that the series am e,(x — a)" converges to f (x)g(x) for 
allx € (a—r,a-+r).Now fix x to be any point in (a — r,a +r). By Theorem 4.1.6, 
we see that both f and g have radii of convergence at least r. In particular, the series 
pat Cn(x — a)" and pa d,(x — a)” are absolutely convergent. Thus if we define 


C= Di lene — a)" | 


and 


D:= Vi ldu(x — a)" | 


n=0 


then C and D are both finite. 
For any N > 0, consider the partial sum 


N ow 
* lCm(x — a)"dy(x — a)". 


n=0 m=0 
We can rewrite this as 
N lo) 
> lda(% — a)"| > lem (x — a)”, 
n=0 m=0 
which by definition of C is equal to 


N 


Yo Idx — a)" IC, 


n=0 


which by definition of D is less than or equal to DC. Thus the above partial sums 
are bounded by DC for every N. In particular, the series 


YS lem (x — a)"dy(x — a)" | 


n=0 m=0 


is convergent, which means that the sum 


D2 do em x = a)" dy = a)" 


n=0 m=0 
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is absolutely convergent. 
Let us now compute this sum in two ways. First of all, we can pull the d, (x — a)” 
factor out of the }°>-_, summation, to obtain 


CO [o,@) 
a dn (x — a)” > Cm(x — ay”. 
n=0 m=0 


By our formula for f(x), this is equal to 


Yo dn(x = a)" f); 


n=0 
by our formula for g(x), this is equal to f(x)g(x). Thus 


Co wo 


fx)8@) = >> >) em(x — a)"da(x — a)”. 


n=0 m=0 
Now we compute this sum in a different way. We rewrite it as 


Co wo 


f@)e) = 0 > emda (= — a)". 


n=0 m=0 


By Fubini’s theorem for series (Theorem 8.2.2), because the series was absolutely 
convergent, we may rewrite it as 


CO CO 
fee) = 0 emdn(e — ay". 
m=0 n=0 
Now make the substitution n’ := n + m, to rewrite this as 
CO CO 
f()g(x) = oy Ss Cm —m (x = a)" 3 
m=0 n'=m 
If we adopt the convention that d; = 0 for all negative j, then this is equal to 
CO CO 
f(x)g(x) = SS Me Cmdy—m (x ms a)" : 
m=0 n’=0 


Applying Fubini’s theorem again, we obtain 
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f (x)g(x) = eS > Cm An! —m (X ae. a)", 


n’'=0 m=0 
which we can rewrite as 
oe) oe) 
t 
f@)g@) = 0-4)” Do cndy—m- 
n'=0 m=0 


Since d; was 0 when j is negative, we can rewrite this as 


f@)g@) = )o@- a)" Vo cndy—m, 


n’'=0 m=0 


which by definition of e is 


f(x)g(x) = Do ew(x — a)", 
n’=0 


as desired. 


4.5 The Exponential and Logarithm Functions 


We can now use the machinery developed in the last few sections to develop a 
rigorous foundation for many standard functions used in mathematics. We begin 
with the exponential function. 


Definition 4.5.1 (Exponential function) For every real number x, we define the 
exponential function exp(x) to be the real number 


CO un 
x 
exp(x) = y a 
n=0 


Theorem 4.5.2 (Basic properties of exponential) 


(a) For every real number x, the series \ ~~~ x is absolutely convergent. In partic- 
ular, exp(x) exists and is real for every x € R, the power series ) ~~ x has an 
infinite radius of convergence, and exp is a real analytic function on (—00, 00). 

(b) exp is differentiable on R, and for every x € R, exp'(x) = exp(x). 

(c) exp is continuous on R, and for every interval [a, b], we have had exp(x) dx = 
exp(b) — exp(a). 

(d) For every x, y € R, we have exp(x + y) = exp(x) exp(y). 
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(e 


te 


We have exp(0) = 1. Also, for every x € R, exp(x) is positive, and exp(—x) = 
1/exp(x). 

(f) exp is strictly monotone increasing: in other words, if x, y are real numbers, 
then we have exp(y) > exp(x) if and only if y > x. 


Proof See Exercise 4.5.1. 


One can write the exponential function in a more compact form, introducing 
famous Euler’s number e = 2.71828183 ..., also known as the base of the natural 
logarithm: 


Definition 4.5.3 (Euler’s number) The number e is defined to be 


(oe) 


1 1 
e:= exp(l) = — 01 + 


n=0 


1 
pees 


1 
iT 3 


1 
a) 


1 


Proposition 4.5.4 For every real number x, we have exp(x) = e*. 


Proof See Exercise 4.5.3. 


In light of this proposition we can and will use e* and exp(x) interchangeably. 

Since e > 1 (why?), we see that e* + +00 as x — +00, and e* > 0 as x > 
—oo. From this and the intermediate value theorem (Theorem 9.7.1) we see that the 
range of the function exp is (0, 00). Since exp is strictly increasing, it is injective, and 
hence exp is a bijection from R to (0, oo) and thus has an inverse from (0, 00) > R. 
This inverse has a name: 


Definition 4.5.5 (Logarithm) We define the natural logarithm function log: (0, co) 
— R (alsocalled In) to be the inverse of the exponential function. Thus exp(log(x)) = 
x and log(exp(x)) = x. 


Since exp is continuous and strictly monotone increasing, we see that log is also 
continuous and strictly monotone increasing (see Proposition 9.8.3). Since exp is 
also differentiable, and the derivative is never zero, we see from the inverse function 
theorem (Theorem 10.4.2) that log is also differentiable. We list some other properties 
of the natural logarithm below. 


Theorem 4.5.6 (Logarithm properties) 
(a 


S 


For every x € (0, 00), we have In'(x) = 1. In particular, by the fundamental 
theorem of calculus, we have Siar) 1 dx = In(b) — In(a) for any interval [a, b] 
in (0, 00). 

(b) We have \In(xy) = In(x) + In(y) for all x, y € (0, 00). 

(c) We have In(1) = 0 and In(1/x) = —In(x) for all x € (0, o&). 

(d) For any x € (0,0) and y € R, we have In(x”) = y In(x). 
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(e) For any x € (—1, 1), we have 


ore) xn 
Ini —x)=— 0 —. 
n 

n=1 


In particular, \n is analytic at 1, with the power series expansion 


(oe) 


| n+l 
inxs) = So! : (x — 1)" 


n=1 


for x € (0, 2), with radius of convergence 1. 


Proof See Exercise 4.5.5. 


Example 4.5.7 We now give a modest application of Abel’s theorem (Theorem 
(-1y"t! 


4.3.1): from the alternating series test we see that )°~ , 
Abel’s theorem we thus see that 


oo (—1" ; oo (—1)"*! 7 
ge 


n=1 n=1 


is convergent. By 


= lim In(x) = In), 


thus we have the formula 


n — + + 


—Exercise— 


Exercise 4.5.1 Prove Theorem 4.5.2. (Hints: for part (a), use the ratio test. For parts 
(bc), use Theorem 4.1.6. For part (d), use Theorem 4.4.1. For part (e), use part (d). 
For part (f), use part (d), and prove that exp(x) > 1 when x is positive. You may find 
the binomial formula from Exercise 7.1.4 to be useful.) 


Exercise 4.5.2 Show that for every integer n > 3, we have 


1 1 1 
0 ee 
Gaal Gao a 


(Hint: first show that (n + k)! > 2*n! for all k = 1, 2,3, ....) Conclude that n!e is 
not an integer for every n > 3. Deduce from this that e is irrational. (Hint: prove by 
contradiction.) 


Exercise 4.5.3 Prove Proposition 4.5.4. (Hint: first prove the claim when x is a 
natural number. Then prove it when x is an integer. Then prove it when x is a rational 
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number. Then use the fact that real numbers are the limits of rational numbers to 
prove it for all real numbers. You may find the exponent laws (Proposition 6.7.3) to 
be useful.) 


Exercise 4.5.4 Let f: R—R be the function defined by setting f(x) := 
exp(—1/x) when x > 0, and f(x) := 0 when x < 0. Prove that f is infinitely differ- 
entiable, and f (0) = 0 for every integer k > 0, but that f is not real analytic at 0. 


Exercise 4.5.5 Prove Theorem 4.5.6. (Hints: for part (a), use the inverse function 
theorem (Theorem 10.4.2) or the chain rule (Theorem 10.1.15). For parts (bcd), use 
Theorem 4.5.2 and the exponent laws (Proposition 6.7.3). For part (e), start with the 
geometric series formula (Lemma 7.3.3) and integrate using Theorem 4.1.6). 


Exercise 4.5.6 Prove that the natural logarithm function is real analytic on (0, +00). 


Exercise 4.5.7 Let f: R — (0, 00) be a positive, real analytic function such that 
f(x) = f(x) for all x € R. Show that f(x) = Ce* for some positive constant C; 
justify your reasoning. (Hint: there are basically three different proofs available. One 
proof uses the logarithm function, another proof uses the function e~*, and a third 
proof uses power series. Of course, you only need to supply one proof.) 


Exercise 4.5.8 Let m > 0 be an integer. Show that 


e* 


lim — =-+0oo. 
x—>-+00 x™ 


(Hint: what happens to the ratio between e**!/(x + 1)” and e*/x” as x > +00?) 


Exercise 4.5.9 Let P(x) be a polynomial, and let c > 0. Show that there exists a real 
number N > 0 such that e* > |P(x)| for all x > N; thus an exponentially growing 
function, no matter how small the growth rate c, will eventually overtake any given 
polynomial P(x), no matter how large. (Hint: use Exercise 4.5.8.) 


Exercise 4.5.10 Let f: (0, +00) x R > Rbethe exponential function f(x, y) := 

x». Show that f is continuous. (Hint: note that Propositions 9.4.10, 9.4.11 only show 
that f is continuous in each variable, which is insufficient, as Exercise 2.2.11 shows. 
The easiest way to proceed is to write f(x, y) = exp(y Inx) and use the continuity 
of exp() and In(). For an extra challenge, try proving this exercise without using the 
logarithm function.) 


4.6 A Digression on Complex Numbers 


To proceed further we need the complex number system C, which is an extension of 
the real number system R. A full discussion of this important number system (and in 
particular the branch of mathematics known as complex analysis) is beyond the scope 
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of this text; here, we need the system primarily because of a very useful mathematical 
operation, the complex exponential function z +> exp(z), which generalizes the real 
exponential function x +» exp(x) introduced in the previous section. 

Informally, we could define the complex numbers as 


Definition 4.6.1 (informal definition of complex numbers) The complex numbers C 
are the set of all numbers of the form a + bi, where a, b are real numbers and i is a 
square root of —1, i27=-1. 


However, this definition is a little unsatisfactory as it does not explain how to 
add, multiply, or compare two complex numbers. To construct the complex numbers 
rigorously we will first introduce a formal version of the complex number a + bi, 
which we shall temporarily denote as (a, b); this is similar to how in Chap. 4, when 
constructing the integers Z, we needed a formal notion of subtraction a— b before the 
actual notion of subtraction a — b could be introduced, or how when constructing 
the rational numbers, a formal notion of division a//b was needed before it was 
superceded by the actual notion a/b of division. It is also similar to how, in the 
construction of the real numbers, we defined a formal limit LIM,-_.5. a, before we 
defined a genuine limit lim, _..5 dy. 


Definition 4.6.2 (Formal definition of complex numbers) A complex number is any 
pair of the form (a,b), where a,b are real numbers, thus for instance (2, 4) is 
a complex number. Two complex numbers (a, b), (c,d) are said to be equal iff 
a =c and b = d, thus for instance (2+ 1,3 +4) = (3,7), but (2, 1) € (1, 2) and 
(2,4) 4 (2, —4). The set of all complex numbers is denoted C. 


At this stage the complex numbers C are indistinguishable from the Cartesian 
product R? = R x R (also known as the Cartesian plane). However, we will intro- 
duce a number of operations on the complex numbers, notably that of complex 
multiplication, which are not normally placed on the Cartesian plane R”. Thus one 
can think of the complex number system C as the Cartesian plane R* equipped with 
a number of additional structures. We begin with the notion of addition and negation. 
Using the informal definition of the complex numbers, we expect 


(a,b) + (c,d) = (a+ bi) + (c+ di) =(a+c)+ (b+ d)i = (a+c,b+4+d) 
and similarly 


—(a,b) = —(a+ bi) = (—a) + (—b)i = (—a, —b). 


As these derivations used the informal definition of the complex numbers, these 
identities have not yet been rigorously proven. However we shall simply encode 
these identities into our complex number system by defining the notion of addition 
and negation by the above rules: 


Definition 4.6.3. (Complex addition, negation, and zero) If z = (a,b) and w= 
(c,d) are two complex numbers, we define their sum z+ w to be the complex 
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number z + w := (a+c,b+d). Thus for instance (2, 4) + (3, —1) = (5,3). We 
also define the negation —z of z to be the complex number —z := (—a, —b), thus for 
instance —(3, —1) = (—3, 1). We also define the complex zero 0c to be the complex 
number Oc = (0, 0). 


It is easy to see that notion of addition is well-defined in the sense that if z = z’ 
and w = w’ then z+ w= z’+ w’. Similarly for negation. The complex addition, 
negation, and zero operations obey the usual laws of arithmetic: 


Lemma 4.6.4 (The complex numbers are an additive group) Jf z1, z2, Z3 are com- 
plex numbers, then we have the commutative property z, + Z2 = Z2 + 21, the asso- 
ciative property (Z; + Z2) +23 = Z1 + (Z2 + 23), the identity property z1 + 0c = 
Oc + 21 = 21, and the inverse property z; + (—z1) = (—z1) + z1 = 0e. 


Proof See Exercise 4.6.1. 


Next, we define the notion of complex multiplication and reciprocal. The informal 
justification of the complex multiplication rule is 


(a, b) - (c,d) = (a+ bi)(c + di) 
=ac+adi + bic + bidi 
= (ac — bd) + (ad + bc)i 
= (ac — bd, ad + bc) 


2 


since i~ is supposed to equal —1. Thus we define 


Definition 4.6.5 (Complex multiplication) If z = (a, b) and w = (c, d) are complex 
numbers, then we define their product zw to be the complex number zw := (ac — 
bd,ad + bc). We also introduce the complex identity 1c := (1, 0). 


This operation is easily seen to be well-defined, and also obeys the usual laws of 
arithmetic: 


Lemma 4.6.6 /fz,, Z2, 23 are complex numbers, then we have the commutative prop- 
erty 2122 = 2221, the associative property (Z122)Z3 = Z1(2223), the identity property 
Zile = lez: = 21, and the distributivity properties z\(z2 + 23) = 21Z2 + 2123 and 
(Z2 + 23)Z1 = 2221 + 2321. 


Proof See Exercise 4.6.2. 


The above lemma can also be stated more succinctly, as the assertion that C is a 
commutative ring. As is usual, we now write z — w as shorthand for z + (—w). 

We now identify the real numbers R with a subset of the complex numbers C by 
identifying any real number x with the complex number (x, 0), thus x = (x, 0). Note 
that this identification is consistent with equality (thus x = yiff (x, 0) = (y, 0)), with 
addition (x; + x2 = x3 iff (11,0) + (42, 0) = (43, 0)), with negation (x = —y iff 
(x, 0) = —(y, 0)), and multiplication (x; x2 = x3 iff (41, 0)(%2, 0) = (x3, 0)), so we 
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will no longer need to distinguish between “real addition” and “complex addition”, 
and similarly for equality, negation, and multiplication. For instance, we can compute 
3(2, 4) by identifying the real number 3 with the complex number (3, 0) and then 
computing (3,0)(2,4) = (8x 2-—0x4,3x4+0 x 2) = (6, 12). Note also that 
0 = 0¢ and | = le, so we can now drop the C subscripts from the zero 0 and the 
identity 1. 

We now define i to be the complex number i := (0, 1). We can now reconstruct 
the informal definition of the complex numbers as a lemma: 


Lemma 4.6.7 Every complex number z € C can be written as z = a + bi for exactly 
one pair a, b of real numbers. Also, we have i = —1, and —z = (—1)z. 


Proof See Exercise 4.6.3. 


Because of this lemma, we will now refer to complex numbers in the more usual 
notation a + bi and discard the formal notation (a, b) henceforth. 


Definition 4.6.8 (Real and imaginary parts) If z is a complex number with the 
representation z = a + bi for some real numbers a, b, we shall call a the real part of 
zand denote i(z) := a, and call b the imaginary part of z and denote 3(z) := b, thus 
for instance (3 + 47) = 3 and 3(3 + 47) = 4, and in general z = R(z) + 73(z). 
Note that z is real iff S3(z) = 0. We say that z is imaginary iff {i(z) = 0, thus for 
instance 4i is imaginary, while 3 + 4: is neither real nor imaginary, and 0 is both real 
and imaginary. We define the complex conjugate Zz of z to be the complex number 
Z:= R(z) — i3(z), thus for instance 3 + 47 = 3 — 41,7 = —i, and 3 = 3. 


The operation of complex conjugation has several nice properties: 


Lemma 4.6.9 (Complex conjugation is an involution) Let z, w be complex numbers, 
thenz+w=Z+W, —z = —Z, andzw =ZW. AlsoZ = z. Finally, we have Z = W 
if and only if z = w, and z = z if and only if z is real. 


Proof See Exercise 4.6.4. 


The notion of absolute value |x| was defined for rational numbers x in Definition 
4.3.1, and this definition extends to real numbers in the obvious manner. However, 
we cannot extend this definition directly to the complex numbers, as most complex 
numbers are neither positive nor negative. (For instance, we do not classify i as 
either a positive or negative number; see Exercise 4.6.11 for some reasons why.) 
However, we can still define absolute value by generalizing the formula |x| = Vx? 
from Exercise 5.6.4: 


Definition 4.6.10 (Complex absolute value) If z = a + bi is acomplex number, we 


define the absolute value |z| of z to be the real number |z| := Ja? + b? = (a7 + 
b?)'/2, 


From Exercise 5.6.4 we see that this notion of absolute value generalizes the notion 
of real absolute value. The absolute value has a number of other good properties: 
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Lemma 4.6.11 (Properties of complex absolute value) Let z, w be complex numbers. 
Then |z| is anon-negative real number, and |z| = 0 if and only if z = 0. Also we have 
the identity zZ = |z|?, and so |z| = zz. As a consequence we have |zw| = |z||w| 
and |z| = |z|. Finally, we have the inequalities 

—l2] 5 8@) <lz; —ils8@) < lz: lel s [R@)| +18@)| 


as well as the triangle inequality |z + w| < |z| + |wI. 


Proof See Exercise 4.6.6. 


Using the notion of absolute value, we can define a notion of reciprocal: 


Definition 4.6.12 (Complex reciprocal) If z is a nonzero complex number, we 
define the reciprocal z~! of z to be the complex number z~! := |z|~7Z (note 
that |z|~? is well-defined as a positive real number because |z| is positive real, 
thanks to Lemma 4.6.11). Thus for instance (1 + 2i)~! = |1 + 2i|-7(1 — 27) = 
(1? + 27)! — 2i) = 4 — 2i. If zis zero, z = 0, we leave the reciprocal 0~! unde- 
fined. 


From the definition and Lemma 4.6.11, we see that 


gS oS eS el el Ss 

and so z~! is indeed the reciprocal of z. We can thus define a notion of quotient z/w 

for any two complex numbers z, w with w ¥ 0 in the usual manner by the formula 
z/wi= zw, 

The complex numbers can be given a distance by defining d(z, w) = |z — w}. 


Lemma 4.6.13 The complex numbers C with the distance d form a metric space. If 
(Zn)p2 is a sequence of complex numbers, and z is another complex number, then we 
have limy_s 9 Zn = Z in this metric space if and only if limy_. 5 R(Zn) = R(z) and 


limno0 Sn) = SZ). 


Proof See Exercise 4.6.9. 


Observe that with our choice of definitions, the space C of complex numbers is 
identical (as a metric space) to the Euclidean plane R’, since the complex distance 
between two complex numbers (a, b), (a’, b') is exactly the same as the Euclidean 
distance af (a — a’)? + (b — b’)? between these points. Thus, every metric property 
that R? satisfies is also obeyed by C; for instance, C is complete and connected, but 
not compact. 

We also have the usual limit laws: 


Lemma 4.6.14 (Complex limit laws) Let (2,)°2, and (Wy)P2, be convergent 
sequences of complex numbers, and let c be a complex number. Then the sequences 
(Zn + Waders Zn — Wn)Pys (CZn) Py, (ZnWn) Py, and (Zn), are also convergent, 


with 


86 4 Power Series 


lim z, + w, = lim z, + lim wy, 
noo 


n—->oo n—->oo 
lim Z, — WwW, = lim z, — lim wy 
noo n—>oo n—->Co 


lim cz, =c lim Zp 


n—>Co n—->o 
lim zw, = ( lim zn) (lim wn) 
n—->oo n—> Oo nN— Oo 


lim Z, = lim z, 
noo n—-oo 


Also, if the wy, are all nonzero and limMy-+o0 Wn is also nonzero, then (Zn/Wn)oo, is 
also a convergent sequence, with 


lim z,/Wy = (lim zn)/Cim wn) 
n—>oo n—-> Oo n—>oo 


Proof See Exercise 4.6.10. 


Observe that the real and complex number systems are in fact quite similar; 
they both obey similar laws of arithmetic, and they have similar structure as metric 
spaces. Indeed many of the results in this textbook that were proven for real-valued 
functions are also valid for complex-valued functions, simply by replacing “real” 
with “complex” in the proofs but otherwise leaving all the other details of the proof 
unchanged. Alternatively, one can always split a complex-valued function f into 
real and imaginary parts R(f), S(f), thus f = R(f) +i3(f), and then deduce 
results for the complex-valued function f from the corresponding results for the 
real-valued functions #(f), S(f). For instance, the theory of pointwise and uniform 
convergence from Chapter 3, or the theory of power series from this chapter, extends 
without any difficulty to complex-valued functions. In particular, we can define the 
complex exponential function in exactly the same manner as for real numbers: 


Definition 4.6.15 (Complex exponential) If z is a complex number, we define the 
function exp(z) by the formula 


n 


oe) 
Zz 
exp(z) := a 
n=0 


Inspired by Proposition 4.5.4, we shall use exp(z) and e* interchangeably. It is 
also possible to define a* for complex z and other real numbers a > 0, but we will 
not need to do so in this text. 

One can state and prove the ratio test for complex series and use it to show that 
exp(z) converges for every z. It turns out that many of the properties from Theorem 
4.5.2 still hold: we have that exp(z + w) = exp(z) exp(w), for instance; see Exercise 
4.6.12. (The other properties require complex differentiation and complex integra- 
tion, but these topics are beyond the scope of this text.) Another useful observation 
is that exp(z) = exp(z); this can be seen by conjugating the partial sums ys q 
and taking limits as N — oo. 
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The complex logarithm turns out to be somewhat more subtle, mainly because exp 
is no longer invertible, and also because the various power series for the logarithm 
only have a finite radius of convergence (unlike exp, which has an infinite radius of 
convergence). This rather delicate issue is beyond the scope of this text and will not 
be discussed here. 


—Exercise— 
Exercise 4.6.1 Prove Lemma 4.6.4. 
Exercise 4.6.2 Prove Lemma 4.6.6. 
Exercise 4.6.3 Prove Lemma 4.6.7. 
Exercise 4.6.4 Prove Lemma 4.6.9. 


Exercise 4.6.5 If z is a complex number, show that 2(z) = oe and 3(z) = a 


Exercise 4.6.6 Prove Lemma 4.6.11. (Hint: to prove the triangle inequality, first 
prove that %(zw) < |z||w|, and hence (from Exercise 4.6.5) that zw + zw < 2|z||w]. 
Then add |z|? + |w|? to both sides of this inequality.) 


Exercise 4.6.7 Show that if z, w are complex numbers with w 4 0, then |z/w| = 
Iz|/|wl. 


Exercise 4.6.8 Let z, w be nonzero complex numbers. Show that |z + w| = |z| + 
|w| if and only if there exists a positive real number c > 0 such that z = cw. 


Exercise 4.6.9 Prove Lemma 4.6.13. 


Exercise 4.6.10 Prove Lemma 4.6.14. (Hint: split z, and w, into real and imaginary 
parts and use the usual limit laws, Lemma 6.1.19, combined with Lemma 4.6.13.) 


Exercise 4.6.11 The purpose of this exercise is to explain why we do not try to 
organize the complex numbers into positive and negative parts. Suppose that there 
was a notion of a “positive complex number” and a “negative complex number” 
which obeyed the following reasonable axioms (cf. Proposition 4.2.9): 


e (Trichotomy) For every complex number z, exactly one of the following statements 
is true: z is positive, z is negative, z is zero. 

e (Negation) If z is a positive complex number, then —z is negative. If z is a negative 
complex number, then —z is positive. 

e (Additivity) If z and w are positive complex numbers, then z + w is also positive. 

e (Multiplicativity) If z and w are positive complex numbers, then zw is also positive. 


Show that these four axioms are inconsistent,, i.c., one can use these axioms to 
deduce a contradiction. (Hints: first use the axioms to deduce that | is positive, and 
then conclude that —1 is negative. Then apply the Trichotomy axiom to z = i and 
obtain a contradiction in any one of the three cases.) 


Exercise 4.6.12 Prove the ratio test for complex series, and use it to show that the 
series used to define the complex exponential is absolutely convergent. Then prove 
that exp(z + w) = exp(z) exp(w) for all complex numbers z, w. 


88 4 Power Series 


4.7 Trigonometric Functions 


We now discuss the next most important class of special functions, after the exponen- 
tial and logarithmic functions, namely the trigonometric functions. (There are several 
other useful special functions in mathematics, such as the hyperbolic trigonometric 
functions and hypergeometric functions, the gamma and zeta functions, and elliptic 
functions, but they occur more rarely and will not be discussed here.) 

Trigonometric functions are often defined using geometric concepts, notably those 
of circles, triangles, and angles. However, it is also possible to define them using more 
analytic concepts and in particular the (complex) exponential function. 


Definition 4.7.1 (Trigonometric functions) If z is a complex number, then we define 


eZ + ee 
cos(z) = ————— 
(z) 5 
and : ; 
; iZ plz 
sin(Z) = 
@) 2i 


We refer to cos and sin as the cosine and sine functions, respectively. 


These formulae were discovered by Leonhard Euler (1707-1783) in 1748, who 
recognized the link between the complex exponential and the trigonometric functions. 
Note that since we have defined the sine and cosine for complex numbers z, we 
automatically have defined them also for real numbers x. In fact in most applications 
one is only interested in the trigonometric functions when applied to real numbers. 

From the power series definition of exp, we have 


gn gro en ee 
er= FeO a tat 
and 2 3 4 
Aine Zz 1Z Zz 
e*=1-iz mt arta 


and so from the above formulae we have 


CO 
Zz Zz (—1)"22" 
cose) =1- FS +G7--- =) Gare 


and Pa 
3 5 n,2n+l1 
: Zz v4 (-—1)"z 
NE a ee: Ona 
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In particular, cos(x) and sin(x) are always real whenever x is real. From the ratio 
test we see that the two power series )°~° 5 oe beer yh are absolutely 
convergent for every x, thus sin(x) and cos(x) are real analytic at 0 with an infinite 
radius of convergence. From Exercise 4.2.8 we thus see that the sine and cosine 
functions are real analytic on all of R. (They are also complex analytic on all of 
C, but we will not pursue this matter in this text.) In particular the sine and cosine 
functions are continuous and differentiable. 


We list some basic properties of the sine and cosine functions below. 


Theorem 4.7.2 (Trigonometric identities) Let x, y be real numbers. 


(a) We have sin(x)? + cos(x)* = 1. In particular, we have sin(x) € [—1, 1] and 
cos(x) € [—1, 1] forall x ER. 

(b) We have sin'(x) = cos(x) and cos’ (x) = — sin(x). 

(c) We have sin(—x) = — sin(x) and cos(—x) = cos(x). 

(d) We have cos(x + y) = cos(x) cos(y) — sin(x) sin(y) and sin(x + y) = sin(x) 
cos(y) + cos(x) sin(y). 

(e) We have sin(O) = 0 and cos(0) = 1. 

(f) We have e'* = cos(x) +isin(x) and e~** = cos(x) —isin(x). In particular 
cos(x) = N(e!*) and sin(x) = S(e'*). 


Proof See Exercise 4.7.1. 


Now we describe some other properties of sin and cos. 
Lemma 4.7.3 There exists a positive number x such that sin(x) is equal to 0. 


Proof Suppose for sake of contradiction that sin(x) 4 0 for all x € (0, 00). Observe 
that this would also imply that cos(x) 4 0 for all x € (0, oo), since if cos(x) = 0 
then sin(2x) = 0 by Theorem 4.7.2(d) (why?). Since cos(0) = 1, this implies by the 
intermediate value theorem (Theorem 9.7.1) that cos(x) > 0 for all x > 0 (why?). 
Also, since sin(0) = 0 and sin’(0) = 1 > 0, we see that sin increasing near 0, hence 
is positive to the right of 0. By the intermediate value theorem again we conclude 
that sin(x) > 0 for all x > 0 (otherwise sin would have a zero on (0, oo)). 

In particular if we define the cotangent function cot(x) := cos(x)/ sin(x), then 
cot(x) would be positive and differentiable on all of (0,00). From the quotient 
rule (Theorem 10.1.13(h)) and Theorem 4.7.2 we see that the derivative of cot(x) 
is —1/ sin(x)? (why?). In particular, we have cot'(x) < —1 for all x > 0. By the 
fundamental theorem of calculus (Theorem 11.9.1) this implies that cot(x + 5) < 
cot(x) — s for all x > 0 and s > 0. But letting s — 00 we see that this contradicts 
our assertion that cot is positive on (0, oo) (why?). 


Let E be the set E := {x € (0, +00) : sin(x) = O}, i.e., E is the set of roots of 
sin on (0, +00). By Lemma 4.7.3, E is non-empty. Since sin’(0) > 0, there exists 
ac > 0 such that FE C [c, +00) (see Exercise 4.7.2). Also, since sin is continuous 
in [c, +00), E is closed in [c, +oo) (why? Use Theorem 2.1.5(d)). Since [c, +00) 
is closed in R, we conclude that E is closed in R. Thus EF contains all its adherent 
points, and thus contains inf(£). Thus if we make the definition 


90 4 Power Series 
Definition 4.7.4 We define z to be the number 
m := inf{x € (0, oo) : sin(x) = 0} 


then we have z € E C [c, +00) (so in particular z > 0) and sin(z) = 0. By def- 
inition of z, sin cannot have any zeroes in (0,7), and so in particular must be 
positive on (0, zr), (cf. the arguments in Lemma 4.7.3 using the intermediate value 
theorem). Since cos’(x) = — sin(x), we thus conclude that cos(x) is strictly decreas- 
ing on (0, 7). Since cos(0) = 1, this implies in particular that cos(zr) < 1; since 
sin? (1) + cos?(zr) = 1 and sin(zr) = 0, we thus conclude that cos(7) = —1. 

In particular we have Euler’s famous formula 


e™ = cos(x) +i sin(z) = —1. 


We now conclude with some other properties of sine and cosine. 


Theorem 4.7.5 (Periodicity of trigonometric functions) Let x be a real number. 


(a) We have cos(x + 1) = —cos(x) and sin(x + 7) = — sin(x). In particular we 
have cos(x + 27) = cos(x) and sin(x + 27) = sin(x), i.e., sin and cos are peri- 
odic with period 21. 

(b) We have sin(x) = 0 ifand only if x/1 is an integer. 

(c) We have cos(x) = 0 ifand only if x/m is an integer plus 1/2. 


Proof See Exercise 4.7.3. 


We can of course define all the other trigonometric functions: tangent, cotangent, 
secant, and cosecant, and develop all the familiar identities of trigonometry; some 
examples of this are given in the exercises. 


—Exercise— 


Exercise 4.7.1 Prove Theorem 4.7.2. (Hint: write everything in terms of exponen- 
tials whenever possible.) 


Exercise 4.7.2 Let f: R > R be a function which is differentiable at x9, with 
f (xo) = Oand f’(xo) 4 0. Show that there exists ac > 0 such that f(y) is nonzero 
whenever 0 < |xg — y| < c. Conclude in particular that there exists ac > 0 such that 
sin(x) £0 for allO <x <c. 


Exercise 4.7.3. Prove Theorem 4.7.5. (Hint: for (c), you may wish to first compute 
sin(z/2) and cos(z/2), and then link cos(x) to sin(x + 2/2).) 


Exercise 4.7.4 Let x, y be real numbers such that x? + y” = 1. Show that there is 
exactly one real number 0 € (—z, z] such that x = sin(@) and y = cos(@). (Hint: 
you may need to divide into cases depending on whether x, y are positive, negative, 
or zero.) 
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Exercise 4.7.5 Show that if r,s > 0 are positive real numbers, and 0, qa are real 
numbers such that re!’ = se’”, then r = s and @ = @ + 27k for some integer k. 


Exercise 4.7.6 Let z be a nonzero complex number. Using Exercise 4.7.4, show 
that there is exactly one pair of real numbers r, @ such that r > 0, 6 € (—z, J, and 
z =,re'®. (This is sometimes known as the standard polar representation of z.) 


Exercise 4.7.7 For any real number @ and integer n, prove the de Moivre identities 
cos(n0) = R((cos@ +isin@)”"); sin(n@) = S((cos6 +i sind)"). 


Exercise 4.7.8 Let tan: (—2/2,72/2) > R be the tangent function tan(x) := 

sin(x)/cos(x). Show that tan is differentiable and monotone increasing, with 
4 tan(x) = 1 + tan(x), and that lim,—.7/2 tan(x) = +-oo and lim,_,_,/2 tan(x) = 
—oo. Conclude that tan is in fact a bijection from (—z/2, 7/2) — R, and thus has 
an inverse function tan~!: R > (—z/2, 1/2) (this function is called the arctangent 
function). Show that tan~! is differentiable and # tan“! (x) = Ta: 

Exercise 4.7.9 Recall the arctangent function tan~! from Exercise 4.7.8. By modi- 
fying the proof of Theorem 4.5.6(e), establish the identity 


00 (—1)? x"! 


ia“ CG) = >) 
ee) 2 2n+1 


n=0 


for all x € (—1, 1). Using Abel’s theorem (Theorem 4.3.1) to extend this identity to 
the case x = 1, conclude in particular the identity 


(Note that the series converges by the alternating series test, Proposition 7.2.11.) 
Conclude in particular that 4 — ; <m <4. (One can of course compute 7 = 
3.1415926... to much higher accuracy, though if one wishes to do so it is advisable 
to use a different formula than the one above, which converges very slowly.) 


Exercise 4.7.10 Let f: R — R be the function 


f(x) := 4 cos2"1x). 


n=1 


(a) Show that this series is uniformly convergent, and that f is continuous. 
(b) Show that for every integer j and every integer m > 1, we have 


J 1 J = 
f A m : 
! ( 32” ) f (5 ) | = 


92 


(c 


wa 


4 Power Series 


(Hint: use the identity 


lo) m—1 oo 
Se (Ss.) 400+ An 


n=1 n=1 n=m+1 


for certain sequences a,. Also, use the fact that the cosine function is periodic 
with period 277, as well as the geometric series formula }°~ 9 r” = — for any 
|r| < 1. Finally, you will need the inequality | cos(x) — cos(y)| < |x — y| for 
any real numbers x and y; this can be proven by using the mean value theorem 
(Corollary 10.2.9), or the fundamental theorem of calculus (Theorem 11.9.4).) 
Using (b), show that for every real number xo, the function f is not differentiable 
at xo. (Hint: for every xo and every m > 1, there exists an integer 7 such that 
J < 32"xo < j + 1, thanks to Exercise 5.4.3.) 


(d) Explain briefly why the result in (c) does not contradict Corollary 3.7.3. 


Chapter 5 ®) 
Fourier Series Cheak for 


In the previous two chapters, we discussed the issue of how certain functions (for 
instance, compactly supported continuous functions) could be approximated by poly- 
nomials. Later, we showed how a different class of functions (real analytic functions) 
could be written exactly (not approximately) as an infinite polynomial, or more pre- 
cisely a power series. 

Power series are already immensely useful, especially when dealing with spe- 
cial functions such as the exponential and trigonometric functions discussed earlier. 
However, there are some circumstances where power series are not so useful, because 
one has to deal with functions (e.g., ./x) which are not real analytic, and so do not 
have power series. 

Fortunately, there is another type of series expansion, known as Fourier series, 
which is also a very powerful tool in analysis (though used for slightly different 
purposes). Instead of analyzing compactly supported functions, it instead analyzes 
periodic functions; instead of decomposing into polynomials, it decomposes into 
trigonometric polynomials. Roughly speaking, the theory of Fourier series asserts 
that just about every periodic function can be decomposed as an (infinite) sum of 
sines and cosines. 


Remark 5.0.1 Jean-Baptiste Fourier (1768-1830) was, among other things, an 
administrator accompanying Napoleon on his invasion of Egypt, and then a Pre- 
fect in France during Napoleon’s reign. After the Napoleonic wars, he returned to 
mathematics. He introduced Fourier series in an important 1807 paper in which he 
used them to solve what is now known as the heat equation. At the time, the claim 
that every periodic function could be expressed as a sum of sines and cosines was 
extremely controversial, even such leading mathematicians as Euler declared that 
it was impossible. Nevertheless, Fourier managed to show that this was indeed the 
case, although the proof was not completely rigorous and was not totally accepted 
for almost another hundred years. 
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There will be some similarities between the theory of Fourier series and that of 
power series, but there are also some major differences. For instance, the convergence 
of Fourier series is usually not uniform (i.e., not in the L® metric), but instead we 
have convergence in a different metric, the L?-metric. Also, we will need to use 
complex numbers heavily in our theory, while they played only a tangential réle in 
power Series. 

The theory of Fourier series (and of related topics such as Fourier integrals and 
the Laplace transform) is vast, and deserves an entire course in itself. It has many, 
many applications, most directly to differential equations, signal processing, electri- 
cal engineering, physics, and analysis, but also to algebra and number theory. We will 
only give the barest bones of the theory here, however, and almost no applications. 


5.1 Periodic Functions 


The theory of Fourier series has to do with the analysis of periodic functions, which 
we now define. It turns out to be convenient to work with complex-valued functions 
rather than real-valued ones. 


Definition 5.1.1 Let L > 0 be a real number. A function f: R > C is periodic 
with period L, or L-periodic, if we have f(x + L) = f(x) for every real number x. 


Example 5.1.2 The real-valued functions f(x) = sin(x) and f(x) = cos(x) are 27- 
periodic, as is the complex-valued function f (x) = e’*. These functions are also 47 - 
periodic, 67:-periodic, etc. (why?). The function f(x) = x, however, is not periodic. 
The constant function f(x) = 1 is L-periodic for every L. 


Remark 5.1.3 If a function f is L-periodic, then we have f(x + kL) = f(x) for 
every integer k (why? Use induction for the positive k, and then use a substitution to 
convert the positive k result to a negative k result. The k = 0 case is of course trivial). 
In particular, if a function f is 1-periodic, then we have f(x +k) = f(x) for every 
k € Z. Because of this, 1-periodic functions are sometimes also called Z-periodic 
(and L-periodic functions called LZ-periodic). 


Example 5.1.4 For any integer n, the functions cos(27nx), sin(27nx), and ertinx 
are all Z-periodic. (What happens when n is not an integer?) Another example of 
a Z-periodic function is the function f: R — C defined by f(x):=1 when x € 
[n,n + 5) for some integern, and f(x):=0whenx € [n + 5, n+ 1) forsome integer 
n. This function is an example of a square wave. 


Henceforth, for simplicity, we shall only deal with functions which are Z-periodic 
(for the Fourier theory of L-periodic functions, see Exercise 5.5.6). Note that in order 
to completely specify a Z-periodic function f: R — C, one only needs to specify 
its values on the interval [0, 1), since this will determine the values of f everywhere 
else. This is because every real number x can be written in the form x =k + y 
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where k is an integer (called the integer part of x, and sometimes denoted [x]) and 
y € [0, 1) (this is called the fractional part of x, and sometimes denoted {x}); see 
Exercise 5.1.1. Because of this, sometimes when we wish to describe a Z-periodic 
function f we just describe what it does on the interval [0, 1), and then say that it is 
extended periodically to all of R. This means that we define f (x) for any real number 
x by setting f(x):= f(y), where we have decomposed x = k + yas discussed above. 
(One can in fact replace the interval [0, 1) by any other half-open interval of length 
1, but we will not do so here.) 

The space of complex-valued continuous Z-periodic functions is denoted C (R/Z; 
C). (The notation R/Z comes from algebra, and denotes the quotient group of the 
additive group R by the additive group Z; more information in this can be found in any 
algebra text.) By “continuous” we mean continuous at all points on R; merely being 
continuous on an interval such as [0, 1] will not suffice, as there may be a discontinuity 
between the left and right limits at 1 (or at any other integer). Thus for instance, the 
functions sin(27nx), cos(2mnx), and e?7'"* are all elements of C (R/Z; C), as are 
the constant functions, however the square wave function described earlier is not in 
C(R/Z; C) because it is not continuous. Also the function sin(x) would also not 
qualify to be in C(R/Z; C) since it is not Z-periodic. 


Lemma 5.1.5 (Basic properties of C(R/Z; C)) 


(a) (Boundedness) If f € C(R/Z; C), then f is bounded (i.e., there exists a real 
number M > 0 such that | f (x)| < M for all x € R). 

(b) (Vector space and algebra properties) If f, g € C(R/Z; ©), then the functions 
f+e, f —g, and fg are also in C(R/Z; C). Also, if c is any complex number, 
then the function cf is also in C(R/Z; C). 

(c) (Closure under uniform limits) If (f,)°~, is a sequence of functions in C(R/Z; C) 
which converges uniformly to another function f: R — C, then f is also in 


C(R/Z; C). 


Proof See Exercise 5.1.2. 


One can make C(R/Z; C) into a metric space by re-introducing the now familiar 
sup norm metric 


dxo(f, 8) = sup| f(x) — g(x)| = sup | f(x) — g@)| 
xeR xe[0,1) 
of uniform convergence. (Why is the first supremum the same as the second?) See 
Exercise 5.1.3. 
— Exercise — 


Exercise 5.1.1 Show that every real number x can be written in exactly one way in 
the form x = k + y, where k is an integer and y ¢€ [0, 1). (Hint: to prove existence 
of such a representation, set k:= sup{/ € Z: 1 < x}.) 
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Exercise 5.1.2 Prove Lemma 5.1.5. (Hint: for (a), first show that f is bounded on 
[0, 1].) 


Exercise 5.1.3. Show that C(R/Z; C) with the sup norm metric d,, is a metric space. 
Furthermore, show that this metric space is complete. 


5.2. Inner Products on Periodic Functions 


From Lemma 5.1.5 we know that we can add, subtract, multiply, and take limits of 
continuous periodic functions. We will need a couple more operations on the space 
C(R/Z; C), though. The first one is that of inner product. 


Definition 5.2.1 (Inner product) If f, g € C(R/Z; C), we define the inner product 
(f, g) to be the quantity 


oe / $@am dk: 


(0,1] 


Remark 5.2.2 In order to integrate a complex-valued function, f(x) = g(x) + 
ih(x), we use the definition that fi, 4) f°= Sian) 8 +4 Sia. 23 ie. we integrate the 
real and imaginary parts of the function separately. Thus for instance Pee + 
ix) dx = frp, ldx +i fi, x dx =1+ 3i. It is easy to verify that all the standard 
rules of calculus (integration by parts, fundamental theorem of calculus, substitution, 
etc.) still hold when the functions are complex-valued instead of real-valued. 


Example 5.2.3 Let f be the constant function f (x):=1, and let g(x) be the function 


g(x):=e?"'*, Then we have 


(f.g) = / Lem dy 
[0,1] 
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Remark 5.2.4 In general, the inner product ( f, g) will be a complex number. (Note 
that f(x)g(x) will be Riemann integrable since both functions are bounded and 
continuous.) 


Roughly speaking, the inner product (f, g) is to the space C(R/Z; C) what the 
dot product x - y is to Euclidean spaces such as R”. We list some basic properties of 
the inner product below; a more in-depth study of inner products on vector spaces 
can be found in any linear algebra text but is beyond the scope of this text. 


Lemma 5.2.5 Let f, g,h € C(R/Z;: C). 


(a) (Hermitian property) We have (g, f) = (f, g). 
(b) (Positivity) We have (f, f) = 0. Furthermore, we have (f, f) = 0 if and only if 
f =O0(ie, f(x) = 0 forall x € R). 


(c) (Linearity in the first variable) We have (f + g,h) = (f,h) + (g,h). For any 
complex number c, we have (cf, g) = c(f, g). 
(d) (Antilinearity in the second variable) We have (f, g +h) = (f, g) + (f, h). For 


any complex number c, we have ( f, cg) = C(f, g). 


Proof See Exercise 5.2.1. 


From the positivity property, it makes sense to define the L? norm || f||2 of a 
function f € C(R/Z; C) by the formula 


1/2 1/2 
Ifllae=av A) = / fa) f(x) dx = / |f(x)I? dx 
[0,1] [0,1] 

Thus || f||2 > 0 for all f. The norm || f||2 is sometimes called the root mean square 
of f. 

Example 5.2.6 If f (x) is the function e?”'*, then 

1/2 1/2 
Wh=( f emetrac) =f frac) =1e=1. 
(0,1) [0,1] 
This L? norm is related to, but is distinct from, the L® norm || f ||o:= SUP cr | f(x)I. 

For instance, if f(x) = sin(27x), then || f ||. = 1 but || f|l2 = wee In general, the 


best one can say is that 0 < || fla < || f ll; see Exercise 5.2.3. 
Some basic properties of the L? norm are given below. 


Lemma 5.2.7 Let f, g € C(R/Z; C). 


(a) (Non-degeneracy) We have || f ||2 = 0 if and only if f = 0. 
(b) (Cauchy-Schwarz inequality) We have \(f, g)| < lI f llallg|l2- 
(c) (Triangle inequality) We have || f + g|l2 < lf ll2 + Ilgll2. 
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(d) (Pythagoras’ theorem) If ( f, g) = 0, then || f + gll3 = IIIS + Ig l- 
(e) (Homogeneity) We have ||cf \|2 = |c||| f \l2 for allc € C. 


Proof See Exercise 5.2.2. 


In light of Pythagoras’ theorem, we sometimes say that f and g are orthogonal 


iff (f, g) =0. 
We can now define the L* metric dj2 on C(R/Z; C) by defining 


1/2 


PAE SAN -eh= / Lf) — gx)? dx 


[0,1] 


Remark 5.2.8 One can verify that d;2 is indeed a metric (Exercise 5.2.4). Indeed, 
the L? metric is very similar to the /? metric on Euclidean spaces R”, which is why 
the notation is deliberately chosen to be similar; you should compare the two metrics 
yourself to see the analogy. 


Note that a sequence f;, of functions in C(R/Z; C) will converge in the L? metric 
to f € C(R/Z; C) if dz2(f,, f) 0 asn — on, or in other words that 


im / |fa(x) — fx)? dx =O. 


[0,1] 


Remark 5.2.9 The notion of convergence in L? metric is different from that of 
uniform or pointwise convergence; see Exercise 5.2.6. 


Remark 5.2.10 The L” metric is not as well-behaved as the L© metric. For instance, 
it turns out the space C(R/Z; C) is not complete in the L? metric, despite being 
complete in the L™ metric; see Exercise 5.2.5. 


— Exercise — 


Exercise 5.2.1 Prove Lemma 5.2.5. (Hint: the last part of (b) is a little tricky. You 
may need to prove by contradiction, assuming that f is not the zero function, and 
then show that ec | f (x)? is strictly positive. You will need to use the fact that f, 
and hence | |, is continuous, to do this.) 


Exercise 5.2.2 Prove Lemma 5.2.7. (Hint: use Lemma 5.2.5 frequently. For the 
Cauchy—Schwarz inequality, begin with the positivity property ( f, f) > 0, but with f 
replaced by the function / || g 13 — (f, g)g, and then simplify using Lemma 5.2.5. You 
may have to treat the case ||g||2 = 0 separately. Use the Cauchy—Schwarz inequality 
to prove the triangle inequality.) 


Exercise 5.2.3 If f € C(R/Z; C) is a non-zero function, show that 0 < || f|l2 < 
| f llz~. Conversely, if 0 < A < B are real numbers, show that there exists a non- 
zero function f € C(R/Z; C) such that || f|/2 = A and || f ||. = B. (Hint: let g 
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be a non-constant non-negative real-valued function in C(R/Z; C), and consider 
functions f of the form f = (c + dg)'/” for some constant real numbers c, d > 0.) 


Exercise 5.2.4 Prove that the L? metric d,2 on C(R/Z;C) does indeed turn 
C(R/Z,; C) into a metric space. (cf. Exercise 1.1.6). 


Exercise 5.2.5 Find a sequence of continuous periodic functions which converge 
in L? to a discontinuous periodic function. (Hint: try converging to the square wave 
function.) 


Exercise 5.2.6 Let f € C(R/Z, C), and let (f,)?°, be a sequence of functions in 
C(R/Z; C). 


(a) Show that if f,, converges uniformly to f, then f;, also converges to f in the L” 
metric. 

(b) Give an example where f,, converges to f in the L? metric, but does not converge 
to f uniformly. (Hint: take f = 0. Try to make the functions f,, large in sup 
norm.) 

(c) Give anexample where f,, converges to f in the L? metric, but does not converge 
to f pointwise. (Hint: take f = 0. Try to make the functions f,, large at one 
point.) 

(d) Give an example where f,, converges to f pointwise, but does not converge to 
f in the L? metric. (Hint: take f = 0. Try to make the functions f, large in L? 
norm.) 


5.3 Trigonometric Polynomials 


We now define the concept of a trigonometric polynomial. Just as polynomials are 
combinations of the functions x” (sometimes called monomials), trigonometric poly- 
nomials are combinations of the functions e?7'”* (sometimes called characters). 


Definition 5.3.1 (Characters) For every integer n, we let e, € C(R/Z; C) denote 
the function 


€n(x):=er7"* 


This is sometimes referred to as the character with frequency n. 


Definition 5.3.2 (Trigonometric polynomials) A function f in C(R/Z; C) is said 
to be a trigonometric polynomial if we can write f = a. nv Cn€n for some integer 
N > 0 and some complex numbers (fon ne 


Example 5.3.3 The function f = 4e_. + ie_; — 2e9 + Oe; — 3e2 is a trigonomet- 
ric polynomial; it can be written more explicitly as 


fasta tie" 2 ae 


100 5 Fourier Series 


Example 5.3.4 For any integer n, the function cos(27nx) is a trigonometric poly- 
nomial, since 
2minx ae eo 2tinx 1 


2 = = -€_y in 
cos(27nx) 5 5° ae 


Similarly the function sinQanx) = ste n + en is a trigonometric polynomial. In 


fact, any linear combination of sines and cosines is also a trigonometric polynomial, 
for instance 3 + i cos(27x) + 4i sin(47rx) is a trigonometric polynomial. 


The Fourier theorem will allow us to write any function in C(R/Z; C) as a Fourier 
series, which is to trigonometric polynomials what power series is to polynomials. 
To do this we will use the inner product structure from the previous section. The key 
computation is 


Lemma 5.3.5 (Characters are an orthonormal system) For any integers n and m, 
we have (€n, €m) = 1 when n =m and (en, €m) = 0 when n £m. Also, we have 
len || = 1. 


Proof See Exercise 5.3.2. 


As a consequence, we have a formula for the coefficients of a trigonometric 
polynomial. 


Corollary 5.3.6 Let f = ye wn Cnen be a trigonometric polynomial. Then we have 
the formula 


Ch = (fs en) 


for all integers —N <n < N. Also, we have 0 = (f, én) whenevern > N orn < 
—N. Also, we have the identity 


N 
2 2 
Ifllg= So lend? 


n=—N 


Proof See Exercise 5.3.3. 


We rewrite the conclusion of this corollary in a different way. 


Definition 5.3.7 (Fourier transform) For any function f € C(R/Z; R), and any 
integer n € Z, we define the n Fourier coefficient of f, denoted f(n), by the 
formula 


f(n):=(f, €n) = / fe dx. 


[0.1] 


The function Fa : Z— Cis called the Fourier transform of f. 
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From Corollary 5.3.6, we see that whenever f = = wv Cn€n iS a trigonometric 
polynomial, we have 


N oo 


f= x (f enlen = > (fi en)en 


n=—N n=—oo 


and in particular we have the Fourier inversion formula 


f= >> fen 


n=—Oo 


or in other words 


f= DI fen. 


n=—CoO 


The right-hand side is referred to as the Fourier series of f. Also, from the second 
identity of Corollary 5.3.6 we have the Plancherel formula 


(oe) 


If = Do f@r. 


n=—CO 


Remark 5.3.8 We stress that at present we have only proven the Fourier inversion 
and Plancherel formulae in the case when f is a trigonometric polynomial. Note 
that in this case that the Fourier coefficients f (n) are mostly zero (indeed, they can 
only be non-zero when —N <n < N), and so this infinite sum is really just a finite 
sum in disguise. In particular there are no issues about what sense the above series 
converge in; they both converge pointwise, uniformly, and in L? metric, since they 
are just finite sums. 


In the next few sections we will extend the Fourier inversion and Plancherel 
formulae to general functions in C(R/Z; C), not just trigonometric polynomials. (It 
is also possible to extend the formula to discontinuous functions such as the square 
wave, but we will not do so here.) To do this we will need a version of the Weierstrass 
approximation theorem, this time requiring that a continuous periodic function be 
approximated uniformly by trigonometric polynomials. Just as convolutions were 
used in the proof of the polynomial Weierstrass approximation theorem, we will also 
need a notion of convolution tailored for periodic functions. 


— Exercise — 


Exercise 5.3.1 Show that the sum or product of any two trigonometric polynomials 
is again a trigonometric polynomial. 


Exercise 5.3.2. Prove Lemma 5.3.5. 
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Exercise 5.3.3 Prove Corollary 5.3.6. (Hint: use Lemma 5.3.5. For the second iden- 
tity, either use Pythagoras’ theorem and induction, or substitute f = ~~. Nn Cn€n 


and expand everything out.) 


5.4 Periodic Convolutions 


The goal of this section is to prove the Weierstrass approximation theorem for trigono- 
metric polynomials: 


Theorem 5.4.1 Let f € C(R/Z; C), and let ¢ > 0. Then there exists a trigonomet- 
ric polynomial P. such that || f — Plo < &. 


This theorem asserts that any continuous periodic function can be uniformly 
approximated by trigonometric polynomials. To put it another way, if we let 
P(R/Z; C) denote the space of all trigonometric polynomials, then the closure of 
P(R/Z; C) in the L® metric is C(R/Z; C). 

It is possible to prove this theorem directly from the Weierstrass approximation 
theorem for polynomials (Theorem 3.8.3), and both theorems are a special case of a 
much more general theorem known as the Stone- Weierstrass theorem, which we will 
not discuss here. However we shall instead prove this theorem from scratch, in order 
to introduce a couple of interesting notions, notably that of periodic convolution. 
The proof here, though, should strongly remind you of the arguments used to prove 
Theorem 3.8.3. 


Definition 5.4.2 (Periodic convolution) Let f, g € C(R/Z; C). Then we define the 
periodic convolution f * g: R > C of f and g by the formula 


fee@= : fodete—y) dy. 
[0,1] 


Remark 5.4.3 Note that this formula is slightly different from the convolution for 
compactly supported functions defined in Definition 3.8.9, because we are only 
integrating over [0, 1] and not on all of R. Thus, in principle we have given the symbol 
f * g two conflicting meanings. However, in practice there will be no confusion, 
because it is not possible for a non-zero function to both be periodic and compactly 
supported (Exercise 5.4.1). 


Lemma 5.4.4 (Basic properties of periodic convolution) Let f, g,h € C(R/Z; C). 


(a) (Closure) The convolution f * g is continuous and Z-periodic. In other words, 
f *g € C(R/Z; C). 

(b) (Commutativity) We have f * g = g * f. 

(c) (Bilinearity) We have f *(g +h)=fxgt+fxhand(f+g)*h=fxht 
g *h. For any complex number c, we have c(f * g) = (cf) * g = f * (cg). 
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Proof See Exercise 5.4.2. 


Now we observe an interesting identity: for any f € C(R/Z; C) and any integer 
n, we have 


A 


7 * en = fen. 
To prove this, we compute 


f * en (x) = i fier"? dy 
[0,1] 


= e2tinx ; f (ye ming dy = men" = f (Men 
[0,1] 
as desired. 


More generally, we see from Lemma 5.4.4(iii) that for any trigonometric polyno- 
mial P = pa Cn€n, We have 


n=N n=N 
pera. Glisten) => fae 
n=—N n=—N 


Thus the periodic convolution of any function in C(R/Z; C) with a trigonometric 
polynomial, is again a trigonometric polynomial. (Compare with Lemma 3.8.13.) 
Next, we introduce the periodic analogue of an approximation to the identity. 


Definition 5.4.5 (Periodic approximation to the identity) Let e > 0 and 0 <6 < 
1/2. A function f € C(R/Z; C) is said to be a periodic (¢€, 5) approximation to the 
identity if the following properties are true: 


(a) f(x) = 0 for all x € R, and So.1 f=. 
(b) We have f(x) < ¢ forall d < |x| < 1-6. 


Now we have an analogue of Lemma 3.8.8: 


Lemma 5.4.6 For every ¢ > Oand0 <6 < 1/2, there exists a trigonometric poly- 
nomial P which is an (€, 5) approximation to the identity. 


Proof We sketch the proof of this Lemma here, and leave the completion of it to 
Exercise 5.4.3. Let N > 1 be an integer. We define the Fejér kernel Fy to be the 


function 
N 


Fy= >> (1- Bye. 


n=—N 


Clearly Fy is a trigonometric polynomial. We observe the identity 
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n-1 [2 


Yen 


n= 


Fy = 


Zl r 


(why?). But from the geometric series formula (Lemma 7.3.3) we have 


N-1 ‘ 

ey —eg — eMN—D¥ sin(t Nx) 
> €n(X) = = 

e; — €9 


sin(x) 


n=0 
when x is not an integer, (why?) and hence we have the formula 


sin(a Nx)? 


EO) Fanaa? 


When x is an integer, the geometric series formula does not apply, but one has 
Fy (x) = N in that case, as one can see by direct computation. In either case we see 
that Fy (x) => O for any x. Also, we have 


N 


[ rma = > (1-9) fe=(1-B)i= 


[0.1] ead [0.1] 
(why?). Finally, since sin(z Nx) < 1, we have 


1 
F < < 
n@) S N sin(ax)2 — N sin(25)? 


whenever 6 < |x| < | — 6 (this is because sin is increasing on [0, 2/2] and decreas- 
ing on [z/2, 2]). Thus by choosing N large enough, we can make Fy (x) < e for all 
6 < |x| <1-6. 


Proof of Theorem 5.4.1 Let f be any element of C(R/Z; C); we know that f is 
bounded, so that we have some M > 0 such that | f(x)| < M forallx €R. 

Let ¢ > 0 be arbitrary. Since f is uniformly continuous, there exists a 6 > 0 
such that | f(x) — f(y)| < ¢ whenever |x — y| < 6. Now use Lemma 5.4.6 to find 
a trigonometric polynomial P which is a (¢, 5) approximation to the identity. Then 
f * P is also a trigonometric polynomial. We now estimate || f — f * Plo. 

Let x be any real number. We have 


5.4 Periodic Convolutions 
If (x) — f * P@&)| =|f@) — P * f(x)| 


eae / F(x — y)P(y) dy 


(0,1] 


= [ seormay- f fe-»roray 


(0, 1] [0,1] 
2 / (F(x) — fx — y)) PQ) dy 
[0,1] 


< / Lf) = f= y)IPO) dy: 


[0,1] 


The right-hand side can be split as 


J ite - fo - iro ay+ iy f(x) — f(x — y)|PQ) dy 


[0,6] [5,1—6] 


n / Lf) — fx — y)IPG) dy 
[1-31] 


which we can bound from above by 


< / eP(y)dy+ ie 2Me dy 
[0,6] [6,1—6] 
+ i) If — 1) — f@—y)1PO) dy 
[1-31] 
< / eP(y) dy+ / 2Me dy + i} éeP(y) dy 
[0,6] [6,1—6] [1-—6,1] 
<e+2Me+e 
= (2M +2)e. 
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Thus we have || f — f * Pllo < (2M + 2)e. Since M is fixed and ¢ is arbitrary, we 


can thus make f « P arbitrarily close to f in sup norm, which proves the periodic 


Weierstrass approximation theorem. 


— Exercise — 


Exercise 5.4.1 Show thatif f: R — Cis bothcompactly supported and Z-periodic, 


then it is identically zero. 
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Exercise 5.4.2 Prove Lemma 5.4.4. (Hint: to prove that f * g is continuous, you 
will have to do something like use the fact that f is bounded, and g is uniformly 
continuous, or vice versa. To prove that f * g = g * f, you will need to use the 
periodicity to “cut and paste” the interval [0, 1].) 


Exercise 5.4.3 Fill in the gaps marked (why?) in Lemma 5.4.6. (Hint: for the first 
identity, use the identities \z|? = 2Z, €n = €_n, and €,€m = Cntm-) 


5.5 The Fourier and Plancherel Theorems 


Using the Weierstrass approximation theorem (Theorem 5.4.1), we can now gener- 
alize the Fourier and Plancherel identities to arbitrary continuous periodic functions. 


Theorem 5.5.1 (Fourier theorem) For any f € C(R/Z; C), the series )~~~ 
(n)e, converges in L? metric to f. In other words, we have 


ne 


= 0. 
2 


N 
Jim, [/- dD Men 


n=—N 


Proof Let ¢ > 0. We have to show that there exists an No such that || f — ae nv f 
(n)é,||2 < ¢ for all sufficiently large N. 

By the Weierstrass approximation theorem (Theorem 5.4.1), we can find a trigono- 
metric polynomial P = oa Cnn such that || f — Plo < e, for some No > 0. 
In particular we have || f — Pll2 < e. 

Now let N > No, and let Fy:= yen N f(nyen. We claim that || f — Fy|l2 < ¢. 
First observe that for any |m| < N, we have 


N 
T—ine Se Df )(€ns €m) = fm) — f(m) =0, 


where we have used Lemma 5.3.5. In particular we have 
(f — Fy, Fy — P) =0 


since we can write Fy — P as a linear combination of the e,, for which |m| < N. By 
Pythagoras’ theorem we therefore have 


lf — PZ =f — Fvll2 + Fw — PIB 


and in particular 
lf — Fwllo = lf — Pla sé 


as desired. 
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Remark 5.5.2 Note that we have only obtained convergence of the Fourier series 
es f (n)e, to f in the L* metric. One may ask whether one has convergence 
in the uniform or pointwise sense as well, but it turns out (perhaps somewhat sur- 
prisingly) that the answer is no to both of those questions. However, if one assumes 
that the function f is not only continuous, but is also differentiable, then one can 
recover pointwise convergence; if one assumes continuously differentiable, then one 
gets uniform convergence as well. These results are beyond the scope of this text and 
will not be proven here. However, we will prove one theorem about when one can 


improve the L* convergence to uniform convergence. 


Theorem 5.5.3 Let f € C(R/Z; C), and suppose that the series Y~~_., | f(n)| is 


absolutely convergent. Then the series \°~ f (n)e, converges uniformly to f. In 
other words, we have 


N 
dim |f- DI fen] =. 


n=—N 00 


Proof By the Weierstrass M-test (Theorem 3.5.7), we see that pee A f (n)eé, con- 
verges to some function F, which by Lemma 5.1.5(iii) is also continuous and Z- 
periodic. (Strictly speaking, the Weierstrass M test was phrased for series from 
n = 1 ton =o, but also works for series from n = —oo to n = +00; this can be 
seen by splitting the doubly infinite series into two pieces.) Thus 


N 

li = f = 

jim |F- D7 fren} =0 
n=—N 00 


which implies that 


N 
lim ||F- F(n)en|| = 
in [P- fone) =0 
n=— 2 


since the Le norm is always less than or equal to the L® norm. But the sequence 
eS. nv f (Me, is already converging in L? metric to f by the Fourier theorem, so 
can only converge in L? metric to F if F = f (cf. Proposition 1.1.20). Thus F = f, 
and so we have 

N 
lim |f— >> f(nen| =0 


N-0o 
n=—N 00 


as desired. 


As a corollary of the Fourier theorem, we obtain 


Theorem 5.5.4 (Plancherel theorem) Forany f € C(R/Z; C), the series ~~... | f 
(n)|? is absolutely convergent, and 
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CO 
2 A7\\2 
Iflb= D> f@p. 
n=—0o 
This theorem is also known as Parseval’s theorem. 


Proof Let ¢ > 0. By the Fourier theorem we know that 


if N is large enough (depending on e). In particular, by the triangle inequality this 
implies that 


N aA 
Yo fMen 


n=—N 


Ifla-es Slflle +e. 


2 


On the other hand, by Corollary 5.3.6 we have 


N 


N 1/2 
= b jon?) 
n=—N 


fen 
n=—N 2 
and hence 


N 
(lfll2a—e < So 1f@)P Ss (fll, +8”. 


Taking lim sup, we obtain 
N 
(Il fll2—€)° <limsup 5° |f@)/? < (fla +8)”. 
N->oo AEN 
Since € is arbitrary, we thus obtain by the squeeze test that 


N 
lim sup )) |f@P = IIL 
N->oo 


n=—N 


and the claim follows. 


There are many other properties of the Fourier transform, but we will not develop 
them here. In the exercises you will see a small number of applications of the Fourier 
and Plancherel theorems. 


— Exercise — 
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Exercise 5.5.1 Let f be a function in C(R/Z; C), and define the trigonometric 
Fourier coefficients ay, b, forn = 0,1, 2,3,... by 


a=? f(x) c0s2nnx) dx; byid f f(x) sin(27rnx) dx. 
[0,1] [0,1] 


(a) Show that the series 


: : 2; by, sin(2 
7% + YG cos(27nx) + by, sin(27mnx)) 


n=1 


converges in L? metric to f. (Hint: use the Fourier theorem, and break up 
the exponentials into sines and cosines. Combine the positive n terms with the 
negative n terms.) 

Show that if YS ad, and yo b, are absolutely convergent, then the above 
series actually converges uniformly to f, and not just in L* metric. (Hint: use 
Theorem 5.5.3.) 


(b 


wm 


Exercise 5.5.2 Let f(x) be the function defined by f(x) = (1 — 2x)* when x € 
[0, 1), and extended to be Z-periodic for the rest of the real line. 


(a) Using Exercise 5.5.1, show that the series 


1 CO 
3 ye 5 COS (27nx) 


converges uniformly 0 f 


(b) Conclude that }°°° 


(c) Conclude that y 1 +. = = (Hint: expand the cosines in terms of exponentials, 
and use Plancherel’s theorem.) 


eae ie x =. (Hint: evaluate the above series at x = 0.) 


Exercise 5.5.3 If f € C(R/Z; C) and P is a trigonometric polynomial, show that 


f * P(n) = fen = f(n)P(n) 


for all integers n. More generally, if f, g € C(R/Z; C), show that 


f * a(n) = f(n)&(n) 


for all integers n. (A fancy way of saying this is that the Fourier transform intertwines 
convolution and multiplication.) 


Exercise 5.5.4 Let f € C(R/Z; C) bea function which is differentiable, and whose 
derivative f’ is also continuous (where we define derivatives of complex-valued 
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functions in exactly the same way as for their real-valued counterparts). Show that 
f’ also lies in C(R/Z; C), and that f’(n) = 2zin f (n) for all integers n. 


Exercise 5.5.5 Let f, g © C(R/Z; C). Prove the Parseval identity 


1 
nf Fore) dx = NP feng. 
0 


néeZ 


(Hint: apply the Plancherel theorem to f + g and f — g, and subtract the two.) Then 
conclude that the real parts can be removed, thus 


1 
/ f(x)g@) dx = D7 f(a). 
0 


neZ 


(Hint: apply the first identity with f replaced by if.) 


Exercise 5.5.6 In this exercise we shall develop the theory of Fourier series for 
functions of any fixed period L. 

Let L > 0, and let f: R — C be a complex-valued function which is continuous 
and L-periodic. Define the numbers c, for every integer n by 


1 
Cn / fer de 


(0,L] 


(a) Show that the series 


(oe) 


» Cn e2rinx/L 


n=—Oo 
converges in L* metric to f. More precisely, show that 


N 


li i 2rinx/L)2 q —0. 
dim, f 1f0)- Yo cet? dx = 0 


n=—N 


(Hint: apply the Fourier theorem to the function f(Lx).) 
(b) If the series )°™° |c,| is absolutely convergent, show that 


n=—OCOoO 


(oe) 


» Cn e2rinx/L 


n=—CO 


converges uniformly to f. 
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(c) Show that 


(oe) 


7 f rere = ye lal’: 


[0,L] na 


(Hint: apply the Plancherel theorem to the function f(Lx).) 
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Chapter 6 ®) 
Several Variable Differential Calculus ectics 


6.1 Linear Transformations 


We shall now switch to a different topic, namely that of differentiation in several 
variable calculus. More precisely, we shall be dealing with maps f : R” > R” from 
one Euclidean space to another, and trying to understand what the derivative of such 
a map is. 

Before we do so, however, we need to recall some notions from linear algebra, 
most importantly that of a linear transformation and a matrix. We shall be rather brief 
here; a more thorough treatment of this material can be found in any linear algebra 
text. 


Definition 6.1.1 (Row vectors) Let n > 1 be an integer. We refer to elements of R” 
as n-dimensional row vectors. A typical n-dimensional row vector may take the form 
X = (X1,X2,...,Xn), Which we abbreviate as (x;)1<;<n; the quantities x1, x2, ..., Xp 
are of course real numbers. If (x;)1<;<, and (y;)1<;<n are n-dimensional row vectors, 
we can define their vector sum by 


(Xi)i<ien + Oi) sien = Oi + Yi) <isn, 
and also if c € R is any scalar, we can define the scalar product c(x;)1<j<n by 
C(Xj) 1<i<ni=(CXi) 1<i<n- 


Of course one has similar operations on R” as well. However, if n 4 m, then we 
do not define any operation of vector addition between vectors in R” and vectors in 
R” (e.g., (2, 3, 4) + 6, 6) is undefined). We also refer to the vector (0, ... , 0) in R” 
as the zero vector and also denote it by 0. (Strictly speaking, we should denote the 
zero vector of R” by Og:, as they are technically distinct from each other and from 
the number zero, but we shall not take care to make this distinction.) We abbreviate 
(—1)x as —x. 
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The operations of vector addition and scalar multiplication obey a number of basic 
properties: 


Lemma 6.1.2 (R” is a vector space) Let x, y, z be vectors in R", and let c,d be 
real numbers. Then we have the commutativity property x + y = y +x, the addi- 
tive associativity property (x-+y)+z=x+(y+2), the additive identity prop- 
erty x +0=0+x =x, the additive inverse property x + (—x) = (—x)+x=0, 
the multiplicative associativity property (cd)x = c(dx), the distributivity properties 
c(x + y) = cx+cyand (c + d)x = cx + dx, and the multiplicative identity property 
Ix =x. 


Proof See Exercise 6.1.1. 


Definition 6.1.3 (Transpose) If (%;)1<i<n = (41, X2, ---, Xn) 18 ann-dimensional row 
vector, we can define its transpose (x;){<;<, by 


Calis = (x1, X2,-+6, Xn) = 


We refer to objects such as (x;){_;<, a8 n-dimensional column vectors. 


Remark 6.1.4 There is no functional difference between a row vector and a column 
vector (e.g., one can add and scalar multiply column vectors just as well as we 
can row vectors); however we shall (rather annoyingly) need to transpose our row 
vectors into column vectors in order to be consistent with the conventions of matrix 
multiplication, which we will see later. Note that we view row vectors and column 
vectors as residing in different spaces; thus for instance we will not define the sum 
of a row vector with a column vector, even when they have the same number of 
elements. 


Definition 6.1.5 (Standard basis row vectors) We identify n special vectors in R", 
the standard basis row vectors e,, ..., €n. For each 1 <j < n, e is the vector which 
has 0 in all entries except for the j-th entry, which is equal to 1. 


For instance, in R?, we have e, = (1, 0, 0), eo = (0, 1, 0), and e3 = (0, 0, 1). Note 
that if x = (4j)1<j<n 18 a vector in R”, then 


n 
X=Xe@) $X2@G2 +... + Xen = ) Xjej, 
j=l 


or in other words every vector in R” is a linear combination of the standard basis 
vectors €],..., €,. (The notation i=! xje; is unambiguous because the operation of 
vector addition is both commutative and associative). Of course, just as every row 
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vector is a linear combination of standard basis row vectors, every column vector is 
a linear combination of standard basis column vectors: 


n 
T T T T T 
xX = xe; +2X2€, +... + XMHe, = ) Xe; - 
j=l 


There are (many) other ways to create a basis for R”, but this is a topic for a linear 
algebra text and will not be discussed here. 


Definition 6.1.6 (Linear transformations) A linear transformation T : R" > R" is 
any function from one Euclidean space R” to another R” which obeys the following 
two axioms: 


(a) (Additivity) For every x, x’ € R”, we have T(x + x/) = Tx + Tx/. 
(b) (Homogeneity) For every x € R” and every c € R, we have T (cx) = cTx. 


Example 6.1.7 The dilation operator T,: R® > R° defined by T,x:=5x (i.e., it 
dilates each vector x by a factor of 5) is a linear transformation, since 5(x + x’) = 
5x + 5x’ for all x, x’ € R? and 5(cx) = c(5x) for all x € R? andc Ee R. 


Example 6.1.8 The rotation operator T,: R* — R* defined by a counterclock- 
wise rotation by 2/2 radians around the origin (so that 72(1, 0) = (0, 1), 72(0, 1) = 
(—1, 0), etc.) is a linear transformation; this can best be seen geometrically rather 
than analytically. 


Example 6.1.9 The projection operator T; : R*? —> R* defined by T3(x, y, z):=(x, y) 
is a linear transformation (why?). The inclusion operator Ty : R* — R° defined by 
T4(x, y):=(x, y, 0) is also a linear transformation (why?). Finally, the identity opera- 
tor I, : R" — R", defined for any n by [,x:=x is also a linear transformation (why?). 


As we shall shortly see, there is a connection between linear transformations and 
matrices. 


Definition 6.1.10 (Matrices) An m x n matrix is an object A of the form 


Qi, aj2 ... A{n 

a21 a22 ... Arn 
A=]... . ; 

Ami Gm2 +++ Amn 


we shall abbreviate this as 
A= (Gij) 1<i<m;1<j<n- 


In particular, n-dimensional row vectors are 1 x n matrices, while n-dimensional 
column vectors are n x 1 matrices. 
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Definition 6.1.11 (Matrix product) Given an m x n matrix A and ann x p matrix 
B, we can define the matrix product AB to be the m x p matrix defined as 


n 


(Gij) 1<i<m;1<j<n Ojk) 1<jen;1<k<p'= ) ay dix 
jel 


1<i<m;1<k<p 


. . EE <n . T . . . = 
In particular, if x° = (xj); <j<n 1S an n-dimensional column vector, and A = 
(Gj) 1<i<m;1<j<n 18 an m X n matrix, then Ax? is an m-dimensional column vector: 


T 
n 


Ax’ = ) ijX; 


j=] l<i<m 


We now relate matrices to linear transformations. If A is an m x n matrix, we can 
define the transformation L,4 : R” > R” by the formula 


(Lax)":=Ax". 
Example 6.1.12 If A is the matrix 
123 
a (j 5 ‘) 


and x = (x1, x2, x3) is a 3-dimensional row vector, then L,x is the 2-dimensional row 
vector defined by 


x] 
123 xy + 2x2 + 3x 
Te, = 1 2 3 
(lary = G 5 4) ey be ee) 
or in other words 
La(x1, X2,.%3) = (x1 + 2x2 + 3x3, 4x) + 5x2 + 6x3). 
More generally, if 


Qi, aj2 ... A{n 
a2, 22 ... Arn 


Ami Gm2 +++ Ann 
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then we have 
n 


LaQy)izien = | Do ax) 


Je! l<i<m 


For any m x n matrix A, the transformation L, is automatically linear; one can easily 
verify that L4(x + y) = Lax + Lay and L4(cx) = c(L,4x) for any n-dimensional row 
vectors x, y and any scalar c. (Why?) 


Perhaps surprisingly, the converse is also true, i.e., every linear transformation 
from R” to R” is given by a matrix: 


Lemma 6.1.13 Let T: R" > R” be a linear transformation. Then there exists 
exactly one m x n matrix A such that T = Ly. 


Proof Suppose T: R"” > R” is a linear transformation. Let e1, e2,..., @, be the 
standard basis row vectors of R”. Then Te,, Te2,..., Te, are vectors in R”. For each 
1 <j <n, we write Te; in co-ordinates as 


Te; = (A1j, Aaj, -. + Gmj) = (Gif) 1<i<m; 


i.e., we define aj to be the i” component of Te;. Then for any n-dimensional row 
vector x = (X1,...,%X,), we have 


n 
Tx = T xe; ; 
j=l 


which (since T is linear) is equal to 


n 
= >> xj(ay)1<icm 
j=l 


n 
= Y-aix)) 1<i<m 
j=l 


1l<i<m 
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But if we let A be the matrix 


a1 412 Qin 

a2) 422 Q2n 
A= 

Ami Gm2 +++ Amn 


then the previous vector is precisely L4x. Thus 7x = L, x for all n-dimensional vectors 
x, and thus T = Ly. 
Now we show that A is unique, i.e., there does not exist any other matrix 


by, bi sass Din 

bz ba... ban 
B=]. . .. 

Dt bm2 tee Dinn 


for which T is equal to Lg. Suppose for sake of contradiction that we could find such 
a matrix B which was different from A. Then we would have L4 = Lg. In particular, 
we have Lye; = Lge; for every 1 < j < n. But from the definition of L, we see that 


Lae; = (dij) 1<i<m 


and 
Le; = (bij) 1<i<m 


and thus we have aj, = bj forevery 1 < i < mand 1 <j <n, thusA and B are equal, 
a contradiction. 


Remark 6.1.14 Lemma 6.1.13 establishes a one-to-one correspondence between 
linear transformations and matrices, and is one of the fundamental reasons why 
matrices are so important in linear algebra. One may ask then why we bother dealing 
with linear transformations at all, and why we don’t just work with matrices all the 
time. The reason is that sometimes one does not want to work with the standard basis 
€1,..., €n, but instead wants to use some other basis. In that case, the correspondence 
between linear transformations and matrices changes, and so it is still important to 
keep the notions of linear transformation and matrix distinct. More discussion on 
this somewhat subtle issue can be found in any linear algebra text. 


Remark 6.1.15 If T = La, then A is sometimes called the matrix representation of 
T and is sometimes denoted A = [7]. We shall avoid this notation here, however. 


The composition T o S of two linear transformations T, S is again a linear trans- 
formation (Exercise 6.1.2). It is customary in linear algebra to abbreviate such com- 
positions T o S simply as TS. The next lemma shows that the operation of composing 
linear transformations is connected to that of matrix multiplication. 
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Lemma 6.1.16 LetA beanm x nmatrix, and let B be ann x p matrix. Then LaLg = 
Lap- 


Proof See Exercise 6.1.3. 


— Exercise — 
Exercise 6.1.1 Prove Lemma 6.1.2. 


Exercise 6.1.2 If 7: R” — R” isa linear transformation, and S: R? > R” isa lin- 
ear transformation, show that the composition TS: R? — R” of the two transforms, 
defined by TS (x):=T (S(x)), is also a linear transformation. (Hint: expand TS (x + y) 
and TS (cx) carefully, using plenty of parentheses.) 


Exercise 6.1.3 Prove Lemma 6.1.16. 


Exercise 6.1.4 Let T: R" — R” be a linear transformation. Show that there exists 
anumber M > 0 such that || 7x|| < M ||x|| for allx € R”. (Hint: use Lemma 6.1.13 to 
write T in terms of a matrix A, and then set MW to be the sum of the absolute values of 
all the entries in A. Use the triangle inequality often—it’s easier than messing around 
with square roots, etc.) Conclude in particular that every linear transformation from 
R’ to R” is continuous. 


6.2 Derivatives in Several Variable Calculus 


Now that we’ve reviewed some linear algebra, we turn now to our main topic of 
this chapter, which is that of understanding differentiation of functions of the form 
f: R" — R”,ie., functions from one Euclidean space to another. For instance, one 
might want to differentiate the function f : R? + R* defined by 


SQ, Y, Z) = Oy, yZ, XZ, xyz). 


In single-variable calculus, when one wants to differentiate a functionf: E > R 
at a point x9, where E is a subset of R that contains xo, this is given by 


Fe 


xXx; xXEE\ {xo} x — Xo 


One could try to mimic this definition in the several variable case f : E — R”, where 
E is now a subset of R”; however we encounter a difficulty in this case: the quantity 
f() —f Qo) will live in R”, and x — xo lives in R”, and we do not know how to 
divide an m-dimensional vector by an n-dimensional vector. 

To get around this problem, we first rewrite the concept of derivative (in one 
dimension) in a way which does not involve division of vectors. Instead, we view 
differentiability at a point xo as an assertion that a function is “approximately linear” 
near x. 
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Lemma 6.2.1 Let E be a subset of R, f : E > R be a function, and L € R. Let xo 
be a limit point of E. Then the following two statements are equivalent. 


(a) f is differentiable at xo, and f'(xo) = L. 
(b) We have lity. xy:xe£—{2) LPF OOTE@ | — 9, 


|x—xo| 
Proof See Exercise 6.2.1. 


In light of the above lemma, we see that the derivative f’(xo) can be interpreted 
as the number L for which |f (x) — (f (%o) + L@ — x9))| is small, in the sense that it 
tends to zero as x tends to xg, even if we divide out by the very small number |x — xo]. 
More informally, the derivative is the quantity L such that we have the approximation 
f @) —f Qo) © Le — 19). 

This does not seem too different from the usual notion of differentiation, but the 
point is that we are no longer explicitly dividing by x — xo. (We are still dividing by 
|x — Xo|, but this will turn out to be OK.) When we move to the several variable case 
f: E— R”, where E C R", we shall still want the derivative to be some quantity 
L such that f (x) — fo) © L(x — x9). However, since f(x) — f (xo) is now an m- 
dimensional vector and x — xg is an n-dimensional vector, we no longer want L to 
be a scalar; we want it to be a linear transformation. More precisely: 


Definition 6.2.2 (Differentiability) Let E be a subset of R", f: E > R” be a func- 
tion, xo € E be a limit point of FE, and let L: R” — R” be a linear transformation. 
We say that f is differentiable at xo with derivative L if we have 


We @) = Fo) +L = 40) _ 


lim 0. 
xX—> x9; xEE—{xo} I|x = Xoll 
Here ||x|| is the length of x (as measured in the /* metric): 
(Gas%.52eG N= G7 hag toa) 


Example 6.2.3 Let f : R’? — R? be the map f (x, y):=(x’, y’), let xp be the point 
xo:=(1, 2), and let L: R? > R? be the map L(x, y):=(2x, 4y). We claim that f is 
differentiable at x) with derivative L. To see this, we compute 


F lf x, y) — FC, 2) + L(x, y) — Cd, 2))) Ih 
(x,y) > (1,2):@,y)A(1,2) Il, y) — , 2) 


Making the change of variables (x, y) = (1, 2) + (a, b), this becomes 


If 1 +a,2+ b) — (fC, 2) + L(a, b))|| 
(a,b) (0,0):(a,b) (0,0) \I(a, b)|| , 


Substituting the formula for f and for L, this becomes 


(1. + a)”, (2+b)”) — 1, 4) — Qa, 4b))|| 
(a,b) (0,0):(a,b)£(0,0) l(a, b)|| 
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which simplifies to 
II(a’, b*) | 
im ——__—., 
(a,b) (0,0):(a,b)A(0,0) || (a, b)|| 


I(a?,b? II 


We use the squeeze test. The expression ia@DI 


hand, we have by the triangle inequality 


is clearly non-negative. On the other 


II(a*, B’)|| < Ia’, 0)|| + 1, b*)|| = a? +.B? 


and hence 


2 2 
Ca“, b~)]| 2 iae. 


l(a, b)|| 


Since Va? + b? > Oas (a, b) — 0, we thus see from the squeeze test that the above 
limit exists and is equal to 0. Thus f is differentiable at x9 with derivative L. 


As you can see, verifying that a function is differentiable from first principles can 
be somewhat tedious. Later on we shall find better ways to verify differentiability, 
and to compute derivatives. 

Before we proceed further, we have to check a basic fact, which is that a function 
can have at most one derivative at any interior point of its domain: 


Lemma 6.2.4 (Uniqueness of derivatives) Let E be a subset of R", f : E > R” bea 
function, xo € E be an interior point of E, and let L; : R" > R” and L, : R” > R” 
be linear transformations. Suppose that f is differentiable at x9 with derivative Ly, 
and also differentiable at x9 with derivative Ly. Then L, = Lp. 


Proof See Exercise 6.2.2. 


Because of Lemma 6.2.4, we can now talk about the derivative of f at interior 
points xo, and we will denote this derivative by f’(xo). Thus f’ (xo) is the unique linear 
transformation from R” to R” such that 


i IF) — Fo) +f" Qo) — xo))I 
im = 


xoro.aém lx — xo 


0. 


Informally, this means that the derivative f’ (xo) is the linear transformation such that 
we have 


fx) — f 0) © FG) & — Xo) 


or equivalently 


f@ © f Xo) +f Go) — x0) 


(this is known as Newton’s approximation; compare with Proposition 10.1.7). 
Another consequence of Lemma 6.2.4 is that if you know that f(x) = g(x) for all 
x € E, and f, g are differentiable at x9, then you also know that f’(xo) = g’(xo) at 
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every interior point of E. However, this is not necessarily true if xp is a boundary point 
of E; for instance, if E is just a single point E = {xo}, merely knowing that f (xo) = 
g(xo) does not imply that f’(xo) = g’(xo). We will not deal with these boundary 
issues here and only compute derivatives on the interior of the domain. 

We will sometimes refer tof’ as the total derivative of f , to distinguish this concept 
from that of partial and directional derivatives below. The total derivative f is also 
closely related to the derivative matrix Df , which we shall define in the next section. 


— Exercise — 
Exercise 6.2.1 Prove Lemma 6.2.1. 


Exercise 6.2.2 Prove Lemma 6.2.4. (Hint: prove by contradiction. If L; # Lo, then 
there exists a vector v such that Liv ¢ Lv; this vector must be nonzero (why?). Now 
apply the definition of derivative, and try to specialize to the case where x = xo + tv 
for some scalar t, to obtain a contradiction.) 


6.3 Partial and Directional Derivatives 
We now connect the notion of differentiability with that of partial and directional 
derivatives, which we now introduce. 


Definition 6.3.1 (Directional derivative) Let E be a subset of R”’,f: E > R™ bea 
function, let xo be an interior point of F, and let v be a vector in R”. If the limit 


f (xo + tv) — f (0) 


t>0;t>0,x9 +tveE i 


exists, we say that f is differentiable in the direction v at xy, and we denote the above 
limit by D,f (xo): 


D,f (xo):= ‘lim fo + tv) —F Go) 
t>0;t>0 t 


Remark 6.3.2, One should compare this definition with Definition 6.2.2. Note that 
we are dividing by a scalar f, rather than a vector, so this definition makes sense, and 
D,f (xo) will be a vector in R”. It is sometimes possible to also define directional 
derivatives on the boundary of E, if the vector v is pointing in an “inward” direction 
(this generalizes the notion of left derivatives and right derivatives from single- 
variable calculus); but we will not pursue these matters here. 


Example 6.3.3 If f: R > R is a function, then D;\f (x) is the same as the right 
derivative of f (x) (if it exists), and similarly D_,f (x) is the same as the negative of 
the left derivative of f (x) (if it exists). 
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Example 6.3.4 We use the function f : R? —> R? defined by f (x, y):=(2’, y”) from 
before, and let x9:=(1, 2) and v:=(3, 4). Then 


fd + 3t,2 +41) — fd, 2) 


Day 
i (1+ 6t + 927,4 + 16r + 1617) — (1, 4) 
=> im 
t—>0;t>0 t 


= lim (6+ 91, 16+ 16t) = ©, 16). 
t0;t>0 


Directional derivatives are connected with total derivatives as follows: 


Lemma 6.3.5 Let E be a subset of R", f : E — R"” be a function, xo be an interior 
point of E, and let v be a vector in R". If f is differentiable at xo, then f is also 
differentiable in the direction v at xo, and 


D,f (x0) = f' (xo) v. 


Proof See Exercise 6.3.1. 


Remark 6.3.6 One consequence of this lemma is that total differentiability implies 
directional differentiability. However, the converse is not true; see Exercise 6.3.3. 


Closely related to the concept of directional derivative is that of partial derivative: 


Definition 6.3.7 (Partial derivative) Let E be a subset of R", let f: E > R” bea 
function, let xo be an interior point of F, and let 1 < j < n. Then the partial derivative 
of f with respect to the x; variable at x9, denoted + (x9), is defined by 

| 


—(x9):= lim {Mos Sa) TO 
Ox; es t+ 0;tA0,x9+te;eE t 7 


d 
qe + té)|+=0 


provided of course that the limit exists. (If the limit does not exist, we leave © (x9) 
ed 
undefined.) 
We say that f is continuously differentiable if the partial derivatives Zz, need i 
exist and are continuous on E. 


Informally, the partial derivative can be obtained by holding all the variables 
other than x; fixed and then applying the single-variable calculus derivative in the x; 


variable. Note that if f takes values in R”, then so will a Indeed, if we write f in 


components as f = (fi, ..-., fm), it is easy to see (why?) that 
OF 74 o,f OH Ofm 
ax; (x0) —_ (3 (Xo), 218-3: ax; (Xo) ’ 


i.e., to differentiate a vector-valued function one just has to differentiate each of the 
components separately. 
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We sometimes replace the variables x; in i with other symbols. For instance, if 


we are was with the function f (x, 7 (x?, y*), then we might refer to af and of 


instead of 4 £ and ©. (In this case, £(x, y) = (2x, 0) and Z(x, y) = (0, 5 ) One 
should aaidon how vet that one shenld only relabel the van ables if it is absolutely 
clear which symbol refers to the first variable, which symbol refers to the second vari- 
able, etc.; otherwise one may become unintentionally confused. For instance, in the 
above example, the expression “ (x, x) is just (2x, 0); however one may mistakenly 
compute 

ane 2 i, See 

Ox ox 
the problem here is that the symbol x is being used for more than just the first variable 
of f . (On the other hand, it is true that a f (x, x) is equal to (2x, 2x); thus the operation 
of total differentiation é is not the same as that of partial differentiation a .) 

From Lemma 6.3.5 (and Proposition 9.5.3 from Analysis I), we know that if a 
function is differentiable at a point xo, then all the partial derivatives x exist at xo, 
and that 

of 


Fy 80) = Dos (%0) = —D-of (0) = Foe 
Xj 


Also, if v = (1, ..-5 Vn) = yi vje;, then we have 


Dyf (Xo) =f" (0) D> vier = D> vif “ode; 
J 


j 


(since f' (xo) is linear) and thus 


) 
Df (x0) = Dov GeO) 
j 


Thus one can write directional derivatives in terms of partial derivatives, provided 
that the function is actually differentiable at that point. 

Just because the partial derivatives exist at a point x9, we cannot conclude that 
the function is differentiable there (Exercise 6.3.3). However, if we know that the 
partial derivatives not only exist, but are continuous, then we can in fact conclude 
differentiability, thanks to the following handy theorem: 


Theorem 6.3.8 Let E be a subset of R", f : E > R” be a function, F be a subset 

of E, and xo be an interior point of F. If all the partial derivatives ae exist on F 
J 

and are continuous at xo, then f is differentiable at x9, and the linear transformation 


f'(o) : R" > R" is defined by 


n 


f'Gapisien = Do). 
J 


j=l 
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Proof Let L: R” > R” be the linear transformation 
n of 
LO) <jcn'= 2 vig, C0). 


We have to prove that 


; ILf x) — (Ff Go) + LO — x0)) Ih 

lim = 0. 
xX—>x93XEE—{Xxo} |x — xo|| 

Let ¢ > 0. It will suffice to find a radius 6 > 0 such that 


If () — Fo) + L@& — Xo) 


Ix — xoll 


for all x € B(xo, 5)\{xo}. Equivalently, we wish to show that 
If) —f Go) — L@ — xo) Il < ella — xoll 


for all x € B(xo, 5)\{xo}- 
Because xo is an interior point of F,, there exists a ball B(x, r) which is contained 


inside F’. Because each partial derivative me exists on F and is continuous at xo, there 


thus exists an0 < 6; < rsuch that || i (x) — i (xo)ll < < neue € B(xo, 4). 
If we take 6 = min(6,,...,6,), then we nk have || i (x) — * (xo) < e/nm for 
every x € B(xo, 5) and every 1 <j <n. 

Let x € B(x, 5). We write x = x9 + vjey + v2@2 +... + ,e, for some scalars 
V1,--+, Vn. Note that 


Ix —xoll = vz tet. +0? 


and in particular we have |v;| < ||x — xo|l for all 1 <j <n. Our task is to show that 


"a 
FAFA a) <8 lk 


Write f in components as f = (f1, fo, .--, fm) (so each fj is a function from E to R). 
From the mean value theorem in the x, variable, we see that 


Ofi 
filxo + vier) — fio) = (xo + tie1)Vv1 
Xx) 


for some t; between O and v,. But we have 
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Of; Of; of a 
Feo + nen - Gs < || <—@o + tie.) — Lia <e/nm 
Ox; Ox; Ox; Ox; 
and hence af 
it + vei) — fix) — 5, OM < e|vy|/nm. 
1 
Summing this over all 1 < i < m (and noting that ||01, .-., Ym) || < yn] +... + Lym! 


from the triangle inequality) we obtain 


< elvi|/n; 


0 
[pce + v1e1) —f (x0) — Ms (X0)V1 
x] 


since |v;| < ||x — xo||, we thus have 


S €||x — xoll/n. 
Oxy 


) 
[pee + v1e1) —f (Xo) -— oF eis 


A similar argument gives 


0 
[pce + vey + v2e2) —f (Xo + v1e1) — cans < &||x — xoll/n 
2 


and so forth up to 


Ilf eo + ver + +++ + Vnen) —f Go + vier +++ + Vn—1€n—-1) 


(x0 ) Vn 


< € |x — xoll /n. 
n 


If we sum these n inequalities and use the triangle inequality ||x + y|| < ||x|| + llyll, 
we obtain a telescoping series which simplifies to 


n 


] 
f (Xo + v1e1 +... + Yuen) — Ff (0) yA < el|x — xoll 
jal 


as desired. 


From Theorem 6.3.8 and Lemma 6.3.5 we see that if the partial derivatives of a 
function f : E — R” exist and are continuous on some set F’,, then all the directional 
derivatives also exist at every interior point x9 of F, and we have the formula 


n 3 
Do, seas vat Xo) = by Wo Go). 
ja 9 
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In particular, if f: E — R is a real-valued function, and we define the gradient 
Vf (xo) of f at xo to be the n-dimensional row vector Vf (xo):=(F (x0), weds Ls (x0)), 


then we have the familiar formula 
D,f (Xo) = v- VF (x0) 
whenever xo is in the interior of the region where the gradient exists and is continuous. 
More generally, if f: E — R” is a function taking values in R”, with f = 


(fi, ---+jJm), and xo is in the interior of the region where the partial derivatives of f 
exist and are continuous, then we have from Theorem 6.3.8 that 


n a 
f'Ga)opisien = Yoyo Co) 
jal 


fi 
= {doy Co) 
I=! l<i<m 
which we can rewrite as 
Lp (xo) Vi) 1<j<n 


where Df (xo) is the m x n matrix 


Df (x0): 


ll 
—, 
@ 
IS 
a 
= 
oO 
— 


OX; ) 1l<i<m;1<j<n 
3 (x9) (xo) ... Axo) 
$2 (x9) B(x)... Z(ao) 


Ox] 0x2 


a im a im , a im 
Hm (xo) (xo) ... 2x0) 


Thus we have 


(Dyf (x0))" = (f' ov)" = Df Go)v". 


The matrix Df (xo) is sometimes also called the derivative matrix or differential 
matrix of f at xo and is closely related to the total derivative f’(xo). One can also 
write Df as 


0 0 0 
Drow) = (FE oo)" Li, eae ww"), 


xX] 0x2 : OXn 


i.e., each of the columns of Df (xo) is one of the partial derivatives of f, expressed as 
a column vector. Or one could write 
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Vfi (Xo) 
Vf2 (x0) 
Df (x0) =] . 
Vin (Xo) 


i.e., the rows of Df (xo) are the gradient of various components of f. In particular, if 
f is scalar-valued (i.e., m = 1), then Df is the same as Vf. 


Example 6.3.9 Letf : R? — R? be the function f (x, y) = (x? + xy, y”). Then us = 
(2x + y, 0) and z = (x, 2y). Since these partial derivatives are continuous on R’, 


we see that f is differentiable on all of R’, and 


arian (BPE), 


Thus for instance, the directional derivative in the direction (v, w) is 
Do wyf (x, y) = (2x + y)v + xw, 2yw). 
— Exercise — 


Exercise 6.3.1 Prove Lemma 6.3.5. (This will be similar to Exercise 6.2.1). 


Exercise 6.3.2 Let E be a subset of R”, let f: E — R” be a function, let x9 be an 
interior point of F, and let 1 <j < n. Show that 7 (xo) exists if and only if De,f xo) 
and D_.,f (xo) exist and are negatives of each other (thus D,,f (xo) = —D_e,f (xo); 
furthermore, one has a (xo) = D.,f (xo) in this case. 


Exercise 6.3.3 Let f: R? — R be the function defined by f (x, y):= oy when 
(x, y) (0, 0), and f (0, 0):=0. Show that f is not differentiable at (0, 0), despite 
being differentiable in every direction v € R? at (0, 0). Explain why this does not 
contradict Theorem 6.3.8. 


Exercise 6.3.4 Let f: R" — R"” be a differentiable function such that f’(x) = 0 for 
all x € R”. Show that f is constant. (Hint: you may use the mean value theorem or 
fundamental theorem of calculus for one-dimensional functions, but bear in mind 
that there is no direct analogue of these theorems for several variable functions. I 
would not advise proceeding via first principles.) For a tougher challenge, replace 
the domain R” by an open connected subset Q of R”. 
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6.4 The Several Variable Calculus Chain Rule 


We are now ready to state the several variable calculus chain rule. Recall that if 
f:X — Yandg: Y —> Z are two functions, then the composition g of: X — Zis 
defined by g of (x):=g(f (x)) for allx € X. 


Theorem 6.4.1 (Several variable calculus chain rule) Let E be a subset of R", and 
let F be a subset of R”. Let f : E — F bea function, and let g: F — RP? be another 
function. Let x9 be a point in the interior of E. Suppose that f is differentiable at xo, 
and that f (xo) is in the interior of F. Suppose also that g is differentiable at f (xo). 
Then g of: E — R? is also differentiable at x9, and we have the formula 


(g of)’ (xo) = g'(f @o))f Go). 


Proof See Exercise 6.4.3. 


One should compare this theorem with the single-variable chain rule, Theorem 
10.1.15; indeed one can easily deduce the single-variable rule as a consequence of 
the several variable rule. 

Intuitively, one can think of the several variable chain rule as follows. Let x be 
close to x9. Then Newton’s approximation asserts that 


fx) —f Go) © FG) & — Xo) 


and in particular f (x) is close to f (xo). Since g is differentiable at f (xo), we see from 
Newton’s approximation again that 


gf x) — gf 0) © 8'(F Ho) )(F&) —f @o)). 


Combining the two, we obtain 


gof (x) — gof (xo) © g(f (Xo))f (x0) (& — x0) 


which then should give (g of)'(xo) = g’(f (x0))f' (xo). This argument however is 
rather imprecise; to make it more precise one needs to manipulate limits rigorously; 
see Exercise 6.4.3. 

As a corollary of the chain rule and Lemma 6.1.16 (and Lemma 6.1.13), we see 
that 


D(g of )(x0) = Dg (f (0) DF (xo): 


i.e., we can write the chain rule in terms of matrices and matrix multiplication, instead 
of in terms of linear transformations and composition. 


Example 6.4.2 Let f: R” > R and g: R” > R be differentiable functions. We 
form the combined function h: R” > R? by defining h(x):=(f (x), g(x)). Now let 
k: R? > R be the multiplication function k(a, b):=ab. Note that 


130 6 Several Variable Differential Calculus 


Dh(xo) = oS 


while 
Dk(a, b) = (b, a) 


(why?). By the chain rule, we thus see that 


Vf (xo) 


Dk 0 h)(x%o) = (80), f Xo) Cee 


) = g(X0) VF Xo) +f 0) Vg (0). 
But k oh =f g (why?), and D(f g) = V(fg). We have thus proven the product rule 


Vifig)=eVf +fVs. 


A similar argument gives the sum rule V(f + g) = Vf + Vg, or the difference 
tule V(f — g) = Vf — Vg, as well as the quotient rule (Exercise 6.4.4). As you can 
see, the several variable chain rule is quite powerful and can be used to deduce many 
other rules of differentiation. 

We record one further useful application of the chain rule. Let T: R” — R” 
be a linear transformation. From Exercise 6.4.1 we observe that T is continuously 
differentiable at every point, and in fact T’(x) = T for every x. (This equation may 
look a little strange, but perhaps it is easier to swallow if you view it in the form 
a (Tx) = T.) Thus, for any differentiable function f : E — R”, we see that Tf : E > 
R” is also differentiable, and hence by the chain rule 


(Tf) (xo) = T (FQ). 


This is a generalization of the single-variable calculus rule (cf )’ = c(f’) for constant 
scalars c. 

Another special case of the chain rule which is quite useful is the following: 
if f: R” > R” is some differentiable function, and x; : R — R are differentiable 
functions for each j = 1, ...n, then 


d = a 
qa 1) 2), 625 %n(t)) = Pango, a0 w+ +5 Xn(t)). 


(Why is this a special case of the chain rule?). 
— Exercise — 


Exercise 6.4.1 Let T: R” — R” be a linear transformation. Show that T is con- 
tinuously differentiable at every point, and in fact T’(x) = T for every x. What is 
DT? 
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Exercise 6.4.2 Let E be a subset of R”. Prove that if a function f: E > R” is 
differentiable at an interior point xo of E, then it is also continuous at xo. (Hint: use 
Exercise 6.1.4.) 


Exercise 6.4.3 Prove Theorem 6.4.1. (Hint: you may wish to review the proof of the 
ordinary chain rule in single-variable calculus, Theorem 10.1.15. The easiest way to 
proceed is by using the sequence-based definition of limit (see Proposition 3.1.5(b)), 
and use Exercise 6.1.4.) 


Exercise 6.4.4 State and prove some version of the quotient rule for functions of 
several variables (i.e., functions of the form f: E — R for some subset E of R”). In 
other words, state a rule which gives a formula for the gradient of f /g; compare your 
answer with Theorem 10.1.13(h). Be sure to make clear what all your assumptions 
are. 


Exercise 6.4.5 Let x: R > R? be a differentiable function, and let r: R > R be 
the function r(t):=||x(4) ||, where ||x|| denotes the length of x as measured in the usual 
I? metric. Let fo be a real number. Show that if r(fo) # 0, then r is differentiable at 
to, and 

x’(to) - X(to) 


OO aes 


(Hint: use Theorem 6.4.1.) 


6.5 Double Derivatives and Clairaut’s Theorem 


We now investigate what happens if one differentiates a function twice. 


Definition 6.5.1 (Twice continuous differentiability) Let E be an open subset of R”, 
and let f: E — R” bea function. We say that f is twice continuously differentiable 
if it is continuously differentiable, and the partial derivatives Z, ee = are them- 
selves continuously differentiable. 


Remark 6.5.2 Continuously differentiable functions are sometimes called C! func- 
tions; twice continuously differentiable functions are sometimes called C” functions. 
One can also define C?, C*, etc., but we shall not do so here. 


Example 6.5.3 Let f : R* — R? be the function f (x, y) = (x* + xy, y?). Then f 
is continuously differentiable because the partial derivatives a Ge, y) = (2x+ y, 0) 
and Lx, y) = (x, 2y) exist and are continuous on all of R?. It is also twice con- 
tinuously differentiable, because the double partial derivatives 2 af (x, y) = (2,0), 
22, y) = (1,0), 22a, y) = (1, 0), ZZ «, y) = (0, 2) all exist and are contin- 
uous. 
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8 Ff and 2 


ay OF ax ay are the 


Observe in the above example that the double derivatives 
same. This is in fact a general phenomenon: 


Theorem 6.5.4 (Clairaut’s theorem) Let E be an open subset of R", and let f : E > 


R” = a twice continuously differentiable function on E. Then we have = = (xo) = 
a 


Ph = F (x9) for all l<i,j<n. 


Proof By working with one component of f at a time we can assume that m = 1. 
The claim is trivial if i = j, so we shall assume that i 4 j. We shall prove the theorem 
for x9 = 0; the general case is similar. (Actually, once one proves Clairaut’s theorem 
for x9 = 0, one can immediately obtain it for general x9 by applying the theorem 
with f (x) replaced by f(x + Xo). ) 

a a 


Let a be the number a:= Bay = (0), and a’ denote the quantity a’: =a 7 (0). Our 


task is to show that a’ = a. 
Let e > 0. Because the double derivatives of f are continuous, we can finda dé > 0 
such that 


a0 
OT ey 2, <eé 
Ox; OX; 

and ao 
OE ie a <eé 
OX; Ox; 


whenever ||x|| < 26. 
Now we consider the quantity 


X :=f (de; + de;) — f (5e;) — f (6e;) +f (0). 


From the fundamental theorem of calculus in the e; variable, we have 
5 
of 
f (6e; + 5é;) —f (6e;) = aye + 6é;) dx; 
Xi 
0 


and 
5 


) 
f (6e;) — f (0) = / oF wei) dx; 
Xi 
0 


and hence 


6 
X= | (Zevei+ 6p - a 
0 


But by the mean value theorem, for each x; we have 
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of of a of 
ag tt “06; = ay, He) = "ee + xe)) 


for some 0 < x; < 4. By our construction of 5, we thus have 
) 0 
oT es, + de;) — oF one;) — da} < €6. 
OX; OX; 


Integrating this from 0 to 6, we thus obtain 
|X — 6a| < £67. 


We can run the same argument with the réle of i and j reversed (note that X is 
symmetric in i and j), to obtain 


|X — 6a'| < 287. 
From the triangle inequality we thus obtain 
|8"a — 87a'| < 228, 


and thus 
la—a'| < 2e. 


But this is true for all ¢ > 0, and a and a’ do not depend on ¢, and so we must have 
a = a’, as desired. 


One should caution that Clairaut’s theorem fails if we do not assume the double 
derivatives to be continuous; see Exercise 6.5.1. 


— Exercise — 


Exercise 6.5.1 Let f: R? — R be the function defined by f (x, yesh when 
(x, y) # (0, 0), and f (0, 0):=0. Show that f is continuously differentiable, and the 
double derivatives oe and oe exist, but are not equal to each other at (0, 0). 


Explain why this does not contradict Clairaut’s theorem. 


6.6 The Contraction Mapping Theorem 


Before we turn to the next topic—namely the inverse function theorem—we need 
to develop a useful fact from the theory of complete metric spaces, namely the 
contraction mapping theorem. 


Definition 6.6.1 (Contraction) Let (X , d) be a metric space, and let f: X — X be 
a map. We say that f is a contraction if we have d(f (x), f(y)) < d(@, y) for all 
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x,y € X. We say that f is a strict contraction if there exists a constant 0 < c < | 
such that d(f (x), f(v)) < cd(x, y) for all x, y € X; we call c the contraction constant 


of f. 


Examples 6.6.2 Themapf: R — R defined by f (x):=x + 1 is a contraction but not 
a strict contraction. The map f : R — R defined by f (x):=x/2 is a strict contraction. 
The map f: [0, 1] — [0, 1] defined by f (x):=x — x? is a contraction but not a strict 
contraction. (For justifications of these statements, see Exercise 6.6.5.) 


Definition 6.6.3 (Fixed points) Let f : X — X be a map, and x € X. We say that x 
is a fixed point of f if f (x) = x. 


Contractions do not necessarily have any fixed points; for instance, the map 
f: R- R defined by f(x) = x + 1 does not. However, it turns out that strict con- 
tractions always do, at least when X is complete: 


Theorem 6.6.4 (Contraction mapping theorem) Let (X , d) be a metric space, and 
let f: X — X be a strict contraction. Then f can have at most one fixed point. 
Moreover, if we also assume that X is non-empty and complete, then f has exactly 
one fixed point. 


Proof See Exercise 6.6.7. 


Remark 6.6.5 The contraction mapping theorem is one example of a fixed point 
theorem—a theorem which guarantees, assuming certain conditions, that a map will 
have a fixed point. There are a number of other fixed point theorems which are also 
useful. One amusing one is the so-called hairy ball theorem, which (among other 
things) states that any continuous map f : S? — S? from the sphere S*:={(x, y, z) € 
R? : x? + y? + 2? = 1) to itself, must contain either a fixed point, or an anti-fixed 
point (a point x € S* such that f (x) = —x). A proof of this theorem can be found in 
any topology text; it is beyond the scope of this text. 


We shall give one consequence of the contraction mapping theorem which is 
important for our application to the inverse function theorem. Basically, this says 
that any map /f ona ball which is a “small” perturbation of the identity map, remains 
one-to-one and cannot create any internal holes in the ball. 


Lemma 6.6.6 Let B(0, r) bea ballin R" centered at the origin, and let g: B(O, r) > 
R” be a map such that g(0) = 0 and 


1 
IIg@) — gO)Il S SI —yIl 


for all x,y € B(O,r) (here ||x|| denotes the length of x in R"). Then the function 
f: BO, r) — R" defined by f (x):=x + g(x) is one-to-one, and furthermore the 
image f (B(O, r)) of this map contains the ball B(O, r/2). 
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Proof We first show that f is one-to-one. Suppose for sake of contradiction that we 
had two different points x, y € B(O, r) such that f(x) = f(y). But then we would 
have x + g(x) = y+ g(y), and hence 


Ig) — gO)I| = Ile — yl). 


The only way this can be consistent with our hypothesis || g(x) — g(y)|| < 5 lx — yl 
is if ||x — y|| = 0, ie., if x = y, a contradiction. Thus f is one-to-one. 

Now we show that f (B(O, r)) contains B(O, r/2). Let y be any point in B(0, r/2); 
our objective is to find a point x € B(O, r) such that f (x) = y, or in other words that 
x = y — g(x). So the problem is now to find a fixed point of the map x > y — g(x). 

Let F: B(O,r) > B(O,r) denote the function F(x):=y — g(x). Observe that if 
x € B(O,r), then 


1 
FOI < Ilyll + lg@I < , + Ig) — g(0)|| < ; 


Ix 0] < += 
x < —— 
2 2 


2 

so F does indeed map B(0, r) to itself. The same argument shows that for a sufficiently 
small ¢ > 0, F maps the closed ball B(O, r — €) to itself. Also, for any x, x’ in B(0, r) 
we have 


1 
IF@) — FO) = Iig@’) — s@Il s glk — 2l| 


so F is a strict contraction on B(0, r), and hence on the complete space B(0, r — e). 
By the contraction mapping theorem, F has a fixed point, i.e., there exists an x such 
that x = y — g(x). But this means that f(x) = y, as desired. 


— Exercise — 


Exercise 6.6.1 Let f: [a, b] — [a, b] be a differentiable function of one variable 
such that |f’(x)| < 1 for all x € [a, b]. Prove that f is a contraction. (Hint: use the 
mean value theorem, Corollary 10.2.9.) If in addition |f’(x)| < 1 for all x € [a, b] 
and f’ is continuous, show that f is a strict contraction. 


Exercise 6.6.2 Show that if f: [a,b] — R is differentiable and is a contraction, 
then |f’(x)| < 1. 


Exercise 6.6.3. Give an example of a function f : [a, b] — R which is continuously 
differentiable and such that |f (x) — f(y)| < |x — y| for all distinct x, y € [a, b], but 
such that |f’(x)| = 1 for at least one value of x € [a, b]. 


Exercise 6.6.4 Given an example of a function f: [a,b] — R which is a strict 
contraction but which is not differentiable for at least one point x in [a, b]. 


Exercise 6.6.5 Verify the claims in Examples 6.6.2. 


Exercise 6.6.6 Show that every contraction on a metric space X is necessarily con- 
tinuous. 
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Exercise 6.6.7 Prove Theorem 6.6.4. (Hint: to prove that there is at most one fixed 
point, argue by contradiction. To prove that there is at least one fixed point, pick 
any x) € X and define recursively x, = f (x0), x2 =f (%1), x3 =f (x2), etc. Prove 
inductively that d (441, %n) < c"d(x1, Xo), and conclude (using the geometric series 
formula, Lemma 7.3.3) that the sequence (x,)°°) is a Cauchy sequence. Then prove 
that the limit of this sequence is a fixed point of f.) 


Exercise 6.6.8 Let (X,d) be a complete metric space, and let f: X — X and 
g: X — X be two strict contractions on X with contraction coefficients c and c’, 
respectively. From Theorem 6.6.4 we know that f has some fixed point xo, and 
g has some fixed point yo. Suppose we know that there is an ¢ > 0 such that 
d(f (x), g(x)) < e for all x € X (e., f and g are within ¢ of each other in the uni- 
form metric). Show that d(xo, yo) < €/(1 — min(c, c’)). Thus nearby contractions 
have nearby fixed points. 


6.7. The Inverse Function Theorem in Several Variable 
Calculus 


We recall the inverse function theorem in single-variable calculus (Theorem 10.4.2), 
which asserts that if a function f: R — R is invertible, differentiable, and f’ (xo) is 
nonzero, then f —! is differentiable at Ff (o), and 


-ly _ 1 
GNF) = a: 


In fact, one can say something even when f’ is not invertible, as long as we know 
that f is continuously differentiable. If f’(x9) is nonzero, then f’(xo) must be either 
strictly positive or strictly negative, which implies (since we are assuming f’ to be 
continuous) that f’(x) is either strictly positive for x near xo, or strictly negative 
for x near xo. In particular, f must be either strictly increasing near xo, or strictly 
decreasing near xo. In either case, f will become invertible if we restrict the domain 
and codomain of f to be sufficiently close to x9 and to f(x), respectively. (The 
technical terminology for this is that f is locally invertible near xo.) 

The requirement that f be continuously differentiable is important; see Exercise 
6.7.1. 

It turns out that a similar theorem is true for functions f: R”’ — R” from one 
Euclidean space to the same space. However, the condition that f’(xo) is nonzero 
must be replaced with a slightly different one, namely that f’(xo) is invertible. We 
first remark that the inverse of a linear transformation is also linear: 


Lemma 6.7.1 Let T: R" — R" be a linear transformation which is also invertible. 
Then the inverse transformation T~!: R" — R" is also linear. 


Proof See Exercise 6.7.2. 
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We can now prove an important and useful theorem, arguably one of the most 
important theorems in several variable differential calculus. 


Theorem 6.7.2 (Inverse function theorem) Let E be an open subset of R", and let 
f: E — R" beafunction which is continuously differentiable on E. Suppose xp € E is 
such that the linear transformation f'(x9) : R" — R" is invertible. Then there exists 
an open set U in E containing xo, and an open set V in R" containing f (xo), such 
that f is a bijection from U to V. In particular, there is an inverse mapf~'!:V > U. 
Furthermore, this inverse map is differentiable at f (xo), and 


(fF -')' F Go)) = F'0)) 


Proof We first observe that once we know the inverse map f~! is differentiable, the 
formula (f~')’(f (xo)) = (f’(xo)) 7! is automatic. This comes from starting with the 
identity 


I=f7'of 


on U, where J: R” — R” is the identity map Jx:=x, and then differentiating both 
sides using the chain rule at xp to obtain 


I'(xo) = F7')' Ff Go) Go). 


Since I’ (xy) = J, we thus have (f~!)’(f (xo)) = (f’(x0))~! as desired. 

We remark that this argument shows that if f’(xo) is not invertible, then there is 
no way that an inverse f~! can exist and be differentiable at f (xo). 

Next, we observe that it suffices to prove the theorem under the additional assump- 
tion f (xo) = 0. The general case then follows from the special case by replacing f by 
a new function , (x):=f (x) — f (xo) and then applying the special case to 7 (note that 
V will have to shift by f (xo)). Note that f—!(y) = f —!(y — f (xo)) (why?). Henceforth 
we will always assume f (xo) = 0. 

In a similar manner, one can make the assumption x9 = 0. The general case 
then follows from this case by replacing f by a new function f (x):=f (x + x9) and 
applying the special case to 7 (note that E and U will have to shift by x9). Note that 
f'0) = 7 ~!(y) +.x0 - why? Henceforth we will always assume xp = 0. Thus we 
now have that f (0) = 0 and that f’(0) is invertible. 

Finally, one can assume that f’(0) = J, where J: R” — R" is the identity trans- 
formation [x = x. The general case then follows from this case by replacing f with 
a new function f : E — R’" defined by f (x):=f'(0)~'f (x), and applying the special 
case to this case. Note from Lemma 6.7.1 that f’(0)~! is a linear transformation. In 
particular, we note that f (0) = 0 and that 


f'O =f'O-'f’'@ =1, 


so by the special case of the inverse function theorem we know that there exists an 
open set U’ containing 0, and an open set V’ containing 0, such that f is a bijection 
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from U’ to V’, and that f~! : V’ + U’ is differentiable at 0 with derivative J. But we 
have f (x) = f' Of (x), and hence f is a bijection from U’ to f’(0)(V’) (note that f’ (0) 
is also a bijection). Since f’ (0) and its inverse are both continuous, f’(0)(V’) is open, 
and it certainly contains 0. Now consider the inverse function f “Es flO(V’) > U’. 
Since f (x) = f’(0)f (x), we see that f—'(y) = f-'(f’(0)~'y) for all y € f’(0)(V’) 
(why? use the fact that f is a bijection from U’ to V’). In particular we see that f~! 
is differentiable at 0. 

So all we have to do now is prove the inverse function theorem in the special 
case, when x9 = 0, f (xo) = 0, and f’(xo) = J. Let g: E — R" denote the function 
g(x):=f (x) — x. Then g(0) = 0 and g’(0) = O. In particular 


dg 
— (0) =0 
forj = 1,...,.Since g is continuously differentiable, there thus exists a ball B(0, r) 
in E such that 
0g (x) 1 
——-. XK —_ 
Ox; ~ 2n2 


for all x € B(O, r). (There is nothing particularly special about sa , we just need a 


nice small number here.) In particular, for any x € B(O, r) and v = (j,..., V,) we 
have 


n ag 
D, =|Soy= 
ID. gl 2 vj aay (x) 


n 
<p 
j=l 


0 
| ~ (x) 
Xj 


Z 1 1 
< — < — : 
< dX IIv || 5,2 = 5, |v 


But now for any x, y € B(0, r), we have by the fundamental theorem of calculus 


1 


d 
#6) -2@)= / Kg(x+ ty — 9) dt 


0 


1 


= [ose + t(y — x)) dt, 


0 
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where the integral of a vector-valued function is defined by integrating each com- 
ponent separately. By the previous remark, the vectors D,_,g(x + t(y — x)) have a 
magnitude of at most z lly — x||. Thus every component of these vectors has magni- 
tude at most x lly — x||. Thus every component of g(y) — g(x) has magnitude at most 
x lly — x||, and hence g(y) — g(x) itself has magnitude at most Al y — x|| (actually, it 
will be substantially less than this, but this bound will be enough for our purposes). 
In other words, g is a contraction. By Lemma 6.6.6, the map f = g + J is thus one- 
to-one on B(O, r), and the image f (B(0, r)) contains B(O, r/2). In particular we have 
an inverse map f—! : B(O, r/2) — B(0, r) defined on B(0, r/2). 
Applying the contraction bound with y = 0 we obtain in particular that 


1 
gl < 5 


for all x € B(O, r), and so by the triangle inequality 


7 Il < IFC <2 I 
5 |Ixll s x S 5le 


for all x € B(O, r). 

Now we set V:=B(0, r/2) and U:=f~!(V) M B(O, r). Then by construction f is 
a bijection from U to V. V is clearly open, and U is also open since f is continuous. 
(Notice that if a set is open relative to B(O, r), then it is open in R” as well.) Now 
we want to show that f~! : V + U is differentiable at 0 with derivative J~' = J. In 
other words, we wish to show that 


i =f OH 16=9) _. 
im — 


x>0;xeV\{0} I||| 


0. 


Since f (0) = 0, we have f~!(0) = 0, and the above simplifies to 


lf-'G) all 


im = 0. 
x 0;xEV\{0} || 


Let (x,)°2., be any sequence in V\{0} that converges to 0. By Proposition 3.1.5(b), 
it suffices to show that ; 

(lf Gn) = Xnll 

lim —————— = 0 


ata IlXnll 


Write y,:=f —!(,). Then yn, € B(O, r) and x, = f (y,). In particular we have 


7 Il < llmnll S 3 | 
a all S Wall S Zila 
a 7s 


and so since ||x,|| goes to 0, ||y,|| goes to zero also, and their ratio remains bounded. 
It will thus suffice to show that 
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lim In =FOmIl _ 4 


neo (Pall 


But since y, is going to 0, and f is differentiable at 0, we have 


li If On) — f () —f'OOn — 0) _ 
im = 


nae II¥nll 


0 


as desired (since f (0) = 0 and f’(0) = J). 


The inverse function theorem gives a useful criterion for when a function is 
(locally) invertible at a point xo - all we need is for its derivative f’ (xo) to be invertible 
(and then we even get further information, for instance we can compute the derivative 
of f~! at f (xo). Of course, this begs the question of how one can tell whether the lin- 
ear transformation f’ (xo) is invertible or not. Recall that we have f’ (x9) = Lye (.,), 80 
by Lemmas 6.1.13 and 6.1.16 we see that the linear transformation f’ (xo) is invertible 
if and only if the matrix Df (xo) is. There are many ways to check whether a matrix 
such as Df (xo) is invertible; for instance, one can use determinants, or alternatively 
Gaussian elimination methods. We will not pursue this matter here, but refer the 
reader to any linear algebra text. 

If f’(xo) exists but is non-invertible, then the inverse function theorem does not 
apply. In sucha situation it is not possible for f ~! to exist and be differentiable at f (xo); 
this was remarked in the above proof. But it is still possible for f to be invertible. For 
instance, the single-variable function f : R —> R defined by f (x) = x° is invertible 
despite f’(0) not being invertible. 


— Exercise — 


Exercise 6.7.1 Let f: R — R be the function defined by f (x):=x + x? sin(1/x*) 
for x £ 0 and f (0):=0. Show that f is differentiable and f’(0) = 1, but f is not 
increasing on any open set containing O (Hint: show that the derivative of f can turn 
negative arbitrarily close to 0. Drawing a graph of f may aid your intuition.) 


Exercise 6.7.2. Prove Lemma 6.7.1. 


Exercise 6.7.3. Let f: R’ — R” be acontinuously differentiable function such that 
f'(x) is an invertible linear transformation for every x € R”. Show that whenever V 
is an open set in R”, that f (V) is also open. (Hint: use the inverse function theorem.) 


Exercise 6.7.4 Let the notation and hypotheses be as in Theorem 6.7.2. Show that 
after shrinking the open sets U, V as necessarily (while still keeping xp in U and 
f (xo) in V), the derivative map f’(x) is invertible for all x € U, and that the inverse 
map f~! is differentiable at every point of V with (f~!)'(f (x) = (f(x)! for all 
x € U. Finally, show that f~! is continuously differentiable on V. 
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6.8 The Implicit Function Theorem 


Recall (from Exercise 3.5.10) that a function f: R — R gives rise to a graph 


{x f@)) x € R} 


which is a subset of R’, usually looking like a curve. However, not all curves are 
graphs, they must obey the vertical line test, that for every x there is exactly one y 
such that (x, y) is in the curve. For instance, the circle {(x, y) € R?:x?+ ¥? = l}is 
not a graph, although if one restricts to a semicircle such as {(x, y) € R? : x? +y? = 
1, y > O} then one again obtains a graph. Thus while the entire circle is not a graph, 
certain local portions of it are. (The portions of the circle near (1, 0) and (—1, 0) are 
not graphs over the variable x, but they are graphs over the variable y). 

Similarly, any function g: R” — R gives rise to a graph {(x, g(x)) : x € R”} in 
R"*!, which in general looks like some sort of n-dimensional surface in R’*! (the 
technical term for this is a hypersurface). Conversely, one may ask which hypersur- 
faces are actually graphs of some function, and whether that function is continuous 
or differentiable. 

If the hypersurface is given geometrically, then one can again invoke the vertical 
line test to work out whether it is a graph or not. But what if the hypersurface is given 
algebraically, for instance the surface {(x, y, z) € R°? : xy + yz + zx = —1}? Ormore 
generally, a hypersurface of the form {x € R” : g(x) = 0}, where g: R” — Rissome 
function? In this case, it is still possible to say whether the hypersurface is a graph, 
locally at least, by means of the implicit function theorem. 


Theorem 6.8.1 (Implicit function theorem) Let E be an open subset of R", let 


f: E = R be continuously differentiable, and let y = (y1,..., Yn) be a point in E 
such that f (y) = 0 and (y) 4 0. Then there exists an open subset U of R"~! con- 
taining (y1,.--, Yn—1), anopen subset V of E containing y, andafunction g: U > R 
such that g(y1,---;Yn—-1) = Yn and 


{(%1,.--,%n) € Vi f,...,Xn) = 0} 
= {(%1,.--,Xn-1, 91, ---,Xn-1)) | M1, ---, Xn-1) € U4. 


In other words, the set {x € V : f (x) = 0} is a graph of a function over U. Moreover, 
g is differentiable at (1, ..., Yn—1), and we have 


ag _. oF of 
Bg ES a ag (6.1) 


foralll<j<n-1. 


Remark 6.8.2 Equation (6.1) is sometimes derived using implicit differentiation. 
Basically, the point is that if you know that 
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f(Q1,--+:Xn) =0 


then (as long as 5— fz 0) the variable x, is “implicitly” defined in terms of the other 
n — | variables, ond one can differentiate the above identity in, say, the x; direction 
using the chain rule to obtain 


of Of OXn 
OX; — OXy OX; ~ 


which is (6.1) in disguise (we are using g to represent the implicit function defining 
X, in terms of x;,...,X;,). Thus, the implicit function theorem allows one to define 
a dependence implicitly, by means of a constraint rather than by a direct formula of 
the form x, = g(%,..-,%n-1)- 


Proof This theorem looks somewhat fearsome, but actually it is a fairly quick con- 
sequence of the inverse function theorem. Let F: E — R" be the function 


F(x, wee Xp =O, reg Mnctad Ons sing) 


This function is continuously differentiable. Also note that 


F(y) = O1, . ++ Yn—1, 9) 


and 


OF , OF + OF + 

DF(y) = ax, 2” Oa”? uehiarse 00 
1 0 ...0 0 
0 1 ...0 0 
_ - ee | a 


4) oy)... 6) Xo) 


Since & Hw) is assumed by hypothesis to be nonzero, this matrix is invertible; this 
can be seen either by computing the determinant, or using row reduction, or by 
computing the inverse explicitly, which is 


1 0 sitsailll) 0 

0 1 ...0 0 
DFQ)y |=]: oa ; 

O- 0 1 0 


-Xy)/a-ZO)/a...-<~HO)/a 1/a 
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where we have written a = “ (y) for short. Thus the inverse function theorem 
applies, and we can find an open set V in E containing y, and an open set W in 
R’ containing F(y) = (0, .--, Yp-1, 0), such that F is a bijection from V to W, and 
that F—' is differentiable at (y1, .. .,y,—1, 0). 

Let us write F—! in co-ordinates as 


Fo' (x) = (i), ho), ---, In) 


where x € W. Since F(F~!(x)) = x, we have hj, ..-,X,) =x; for all l<j< 
n—1landx € W, and 


Sf (@1, ee Xn—15 An (X1, Soh KH) =Xn- 


Also, h, is differentiable at (1, ..., Y,—-1, 0) since F-' is. 


Now we set U:={(x1,...,Xn—-1) € R"7! : (xq, ...,%-1, 0) € W}. Note that U is 
open and contains (yj, ..., Yn-1). Now we define g: U > Rby g(1,..., Xp-1):=Mn 
(x1, .--,Xp-1, 0). Then g is differentiable at (1, ..., y,-1). Now we prove that 


{(x1,---,%n) EV: fy,.--,Xn) = O} 


= {(X1,---,Xn-1, G1, ~~ Xn-1)) t 1, - + Xn-1) € U4. 


First suppose that (x1, ...,%,) € V andf (x1, ...,X,) = 0. Then we have F(x,..., 
Xn) = (X%1,---,X%n—-1, 0), which lies in W. Thus (x1, ...,X,_1) lies in U. Applying 
F—!, we see that (x), ...,%,)) = F7! (xy, ...,%,_1, 0). In particular x, = h,(x1,..., 
Xn—1, 0), and hence x, = g(x1,...,Xn—-1). Thus every element of the left-hand set lies 


in the right-hand set. The reverse inclusion comes by reversing all the above steps 
and is left to the reader. 

Finally, we show the formula for the partial derivatives of g. From the preceding 
discussion we have 


Ff Cisse HHA 8(X1,---,Xn-1)) =0 


for all (41, ...,X%,-1) € U. Since g is differentiable at (y1,..., y,»-1), and f is differ- 
entiable at (1, ..-, Yn-1, 01, ---; Yn—-1)) = y, we may use the chain rule, differen- 
tiating in x;, to obtain 


0 0 0 
Fy 2 OVS Or. -- sat) = 0 
A 


Ox; Xn Xx, 


and the claim follows by simple algebra. 


Example 6.8.3 Consider the surface S:={(x, y,z) € R?:xy+yztzx=—l}, 
which we rewrite as {(x, y,z) € R? : f(x, y, Z) = 0}, where f: R? = R is the 
function f (x, y, z):=xy + yz + zx + 1. Clearly f is continuously differentiable, and 


of = y+x. Thus for any (x9, yo, Zo) in S with yo + x) € 0, one can write this surface 
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(near (x0, Yo, Zo)) as a graph of the form {(x, y, g(x, y)) : &, y) € U} for some open 
set U containing (xo, yo), and some function g which is differentiable at (xo, yo). 
Indeed one can implicitly differentiate to obtain that 


yo + Zo dg Xo + Zo 
and — (xo, yo) = — ‘ 
yo + Xo dy yo + xo 


0 
of 59) == 
Ox 


In the implicit function theorem, if the derivative 24 ! equals zero at some point, 
then it is unlikely that the set {x € R” : f(x) = 0} can fe written as a graph of the x, 
variable in terms of the other n — | variables near that point. However, if some other 
derivative ie i is nonzero, then it would be possible to write the x; variable in terms of 
the other n — 1 variables, by a variant of the implicit function theorem. Thus as long 
as the gradient Vf is not entirely zero, one can write this set {x €¢ R” : f(x) = 0} 
as a graph of some variable x; in terms of the other n — | variables. (The circle 
{(x, y) € R* : x? + y? — 1 = 0} is a good example of this; it is not a graph of y in 
terms of x, or x in terms of y, but near every point it is one of the two. And this is 
because the gradient of x? + y? — 1 is never zero on the circle.) However, if Vf does 
vanish at some point xo, then we say that f has a critical point at xq and the behavior 
there is much more complicated. For instance, the set {(x, y) € R* : x* — y? = 0} 
has a critical point at (0, 0) and there the set does not look like a graph of any sort 
(it is the union of two lines). 


Remark 6.8.4 Sets which look like graphs of continuous functions at every point 
have a name, they are called manifolds. Thus {x € R” : f (x) = 0} will be a manifold 
if it contains no critical points of f. The theory of manifolds is very important in 
modern geometry (especially differential geometry and algebraic geometry), but we 
will not discuss it here as it is a graduate level topic. 


— Exercise — 


Exercise 6.8.1 Let the notation and hypotheses be as in Theorem 6.8.1. Show that, 
after shrinking the open sets U, V as necessary, that the function g becomes contin- 
uously differentiable on all of U, and the Eq. (6.1) holds at all points of U. 


Chapter 7 ®) 
Lebesgue Measure si 


In the previous chapter we discussed differentiation in several variable calculus. It is 
now only natural to consider the question of integration in several variable calculus. 
The general question we wish to answer is this: given some subset Q of R”, and some 
real-valued function f: Q — R, is it possible to integrate f on Q to obtain some 
number / g f? Ut is possible to consider other types of functions, such as complex- 
valued or vector-valued functions, but this turns out not to be too difficult once one 
knows how to integrate real-valued functions, since one can integrate a complex or 
vector-valued function, by integrating each real-valued component of that function 
separately.) 

In one dimension we already have developed (in Chap. 11) the notion of a Riemann 
integral Ne b] Jf, which answers this question when Q is an interval Q = [a, b], and 
f is Riemann integrable. Exactly what Riemann integrability means is not important 
here, but let us just remark that every piecewise continuous function is Riemann 
integrable, and in particular every piecewise constant function is Riemann integrable. 
However, not all functions are Riemann integrable. It is possible to extend this notion 
of a Riemann integral to higher dimensions, but it requires quite a bit of effort and 
one can still only integrate “Riemann integrable” functions, which turn out to be a 
rather unsatisfactorily small class of functions. (For instance, the pointwise limit of 
Riemann integrable functions need not be Riemann integrable, and the same goes for 
an L? limit, although we have already seen that uniform limits of Riemann integrable 
functions remain Riemann integrable.) 

Because of this, we must look beyond the Riemann integral to obtain a truly satis- 
factory notion of integration, one that can handle even very discontinuous functions. 
This leads to the notion of the Lebesgue integral, which we shall spend this chapter 
and the next constructing. The Lebesgue integral can handle a very large class of 
functions, including all the Riemann integrable functions but also many others as 
well; in fact, it is safe to say that it can integrate virtually any function that one actu- 
ally needs in mathematics, at least if one works on Euclidean spaces and everything 
is absolutely integrable. (If one assumes the axiom of choice, then there are still some 
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pathological functions one can construct which cannot be integrated by the Lebesgue 
integral, but these functions will not come up in real-life applications.) 

Before we turn to the details, we begin with an informal discussion. In order 
to understand how to compute an integral Is jf, we must first understand a more 
basic and fundamental question: how does one compute the length/area/volume of 
Q? To see why this question is connected to that of integration, observe that if one 
integrates the function | on the set Q, then one should obtain the length of Q (if Q is 
one-dimensional), the area of Q (if Q is two-dimensional), or the volume of Q (if Q 
is three-dimensional). To avoid splitting into cases depending on the dimension, we 
shall refer to the measure of Q as either the length, area, volume, (or hypervolume, 
etc.) of &2, depending on what Euclidean space R” we are working in. 

Ideally, to every subset Q of R” we would like to associate a non-negative 
number m(Q), which will be the measure of Q (i.e., the length, area, volume, 
etc.). We allow the possibility for m(Q) to be zero (e.g., if Q is just a sin- 
gle point or the empty set) or for m(Q) to be infinite (e.g., if Q is all of R”). 
This measure should obey certain reasonable properties; for instance, the mea- 
sure of the unit cube (0, 1)” :={(11,...,X,) 10 <x; < 1} should equal 1, we 
should have m(A U B) = m(A) + m(B) if A and B are disjoint (and similarly that 
m (U2, An) = oo, m(An) when the A, are disjoint), we should have m(A) < 
m(B) whenever A C B, and we should have m(x + A) = m(A) for any x € R” 
(i.e., if we shift A by the vector x the measure should be the same). 

Remarkably, it turns out that such a measure does not exist; one cannot assign 
a non-negative number to every subset of R” which has the above properties. This 
is quite a surprising fact, as it goes against one’s intuitive concept of volume; we 
shall prove it later in these notes. (An even more dramatic example of this failure of 
intuition is the Banach-Tarski paradox, in which a unit ball in R? is decomposed into 
five pieces, and then the five pieces are reassembled via translations and rotations to 
form two complete and disjoint unit balls, thus violating any concept of conservation 
of volume; however we will not discuss this paradox here.) 

What these paradoxes mean is that it is impossible to find a reasonable way to 
assign a measure to every single subset of R”. However, we can salvage matters by 
only measuring a certain class of sets in R’—the measurable sets. These are the 
only sets 2 for which we will define the measure m({2), and once one restricts one’s 
attention to measurable sets, one recovers all the above properties again. Furthermore, 
almost all the sets one encounters in real life are measurable (e.g., all open and closed 
sets will be measurable), and so this turns out to be good enough to do analysis. 


7.1 The Goal: Lebesgue Measure 


Let R” be a Euclidean space. Our goal in this chapter is to define a concept of 
measurable set, which will be a special kind of subset of R”, and for every such 
measurable set Q C R”, we will define the Lebesgue measure m(&2) to be a certain 
number in [0, oo]. The concept of measurable set will obey the following properties: 
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(i) (Borel property) Every open set in R” is measurable, as is every closed set. 
(11) (Complementarity) If Q is measurable, then R”\Q is also measurable. 

(iii) (Boolean algebra property) If (82;)j<y is any finite collection of measurable 
sets (so J is finite), then the union |), , 2; and intersection (),. , Q; are also 
measurable. 

(iv) (o-algebra property) If (Q;)j<y are any countable collection of measurable 
sets (so J is countable), then the union |) ,. ; Q; and intersection () ,. ; Qj are 
also measurable. 


jes jet 


jes jes 

Note that some of these properties are redundant; for instance, (iv) will imply 
(iii), and once one knows all open sets are measurable, (ii) will imply that all closed 
sets are measurable also. The properties (i-iv) will ensure that virtually every set 
one cares about is measurable; though as indicated in the introduction, there do exist 
non-measurable sets. 

To every measurable set (2, we associate the Lebesgue measure m(&2) of Q, which 
will obey the following properties: 


(v) (Empty set) The empty set @ has measure m(@) = 0. 
(vi) (Positivity) We have 0 < m(&2) < +00 for every measurable set (2. 
(vii) (Monotonicity) If A C B, and A and B are both measurable, then m(A) < 
m(B). 
(viii) (Finite sub-additivity) If (A ;) ;<, are a finite collection of measurable sets, then 


- (Wie; Aj) < Dijes m(Aj). 
(ix) (Finite additivity) If (A;)j<7 are a finite collection of disjoint measurable sets, 
then mOjes Aj)= Doger m(Aj). 
(x) (Countable sub-additivity) If (A ;)j;<j are a countable collection of measurable 
sets, then m (Sie; Aj) S Vices m(A;). 
(xi) (Countable additivity) If (A;)j<; are a countable collection of disjoint mea- 
surable sets, then m (Wie, Aj) = Vics m(Aj). 
(xii) (Normalization) The unit cube [0, 1]" = {(),...,%,) €R":O0< x; <1 
for all 1 < j <n} has measure m((0, 1]”) = 1. 
(xili) (Translation invariance) If Q is a measurable set, and x € R”, then x + 
Q:= {x + y: y € Q} is also measurable, and m(x + Q) = m(Q2). 


Again, many of these properties are redundant; for instance the countable additiv- 
ity property can be used to deduce the finite additivity property, which in turn can be 
used to derive monotonicity (when combined with the positivity property). One can 
also obtain the sub-additivity properties from the additivity ones. Note that m(Q) can 
be +00, and so in particular some of the sums in the above properties may also equal 
+-oo; in this chapter we adopt the convention that an infinite sum )~* jes 4j Of non- 
negative quantities a; is equal to +-oo if the sum is not absolutely convergent. (Since 
everything is non-negative we will never have to deal with indeterminate forms such 
as —oo + +00.) 

Our goal for this chapter can then be stated thus: 
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Theorem 7.1.1 (Existence of Lebesgue measure). There exists a concept of a mea- 
surable set, and a way to assign a number m(Q2) to every measurable subset Q € R", 
which obeys all of the properties (i)-(xiii). 


It turns out that Lebesgue measure is pretty much unique; any other concept of 
measurability and measure which obeys axioms (1i)—(xiii) will largely coincide with 
the construction we give. However there are other measures which obey only some 
of the above axioms; also, we may be interested in concepts of measure for other 
domains than Euclidean spaces R”. This leads to measure theory, which is an entire 
subject in itself and will not be pursued here; however we do remark that the concept 
of measures is very important in modern probability, and in the finer points of analysis 
(e.g., in the theory of distributions). 


7.2 First Attempt: Outer Measure 


Before we construct Lebesgue measure, we first discuss a somewhat naive approach 
to finding the measure of a set—namely, we try to cover the set by boxes, and then 
add up the volume of each box. This approach will almost work, giving us a concept 
called outer measure which can be applied to every set and obeys all of the properties 
(v)—(xiii) except for the additivity properties (ix), (xi). Later we will have to modify 
outer measure slightly to recover the additivity property. 

We begin by starting with the notion of an open box. 


Definition 7.2.1 (Open box) An open box (or box for short) B in R” is any set of 
the form 


= Gj, 07) S41 Hiss sn Xn) E xX; € (a;,0;) tora <1<nj, 
B I [¢ b;) := {¢ ) eR” (a;, b;) for all 1 <7 } 


i=l 


where b; > a; are real numbers. We define the volume vol(B) of this box to be the 
number 


vol(B):= | [(b) — a:) = (bh) — a) (bz — a2)... bn = an). 


i=1 


For instance, the unit cube (0, 1)” is a box, and has volume 1. In one dimension 
n = I, boxes are the same as open intervals. One can easily check that in general 
dimension that open boxes are indeed open. Note that if we have b; = a; for some 
i, then the box becomes empty, and has volume 0, but we still consider this to be a 
box (albeit a rather silly one). Sometimes we will use vol,,(B) instead of vol(B) to 
emphasize that we are dealing with n-dimensional volume, thus for instance vol, (B) 
would be the length of a one-dimensional box B, vol2(B) would be the area of a 
two-dimensional box B, etc. 
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Remark 7.2.2. We of course expect the measure m(B) of a box to be the same as the 
volume vol(B) of that box. This is in fact an inevitable consequence of the axioms 
(i)—(xiii) (see Exercise 7.2.5). 


Definition 7.2.3. (Covering by boxes) Let Q C R" be a subset of R”. We say that a 
collection (Bj) jez of boxes cover Q iff Q C Ujes Bj. 


Suppose Q C R” can be covered by a finite or countable collection of boxes 
(B;) jes. If we wish Q to be measurable, and if we wish to have a measure obeying the 
monotonicity and sub-additivity properties (vil), (vill), (x) and if we wish m(B;) = 
vol(B;) for every box j, then we must have 


m(Q) <m||)B; | < >> m(B)) = Y- vol(B)). 


jes jes jes 


We thus conclude 


m(Q) < inf Y° vol(B;) : (Bj) jez covers Q; J at most countable 
jes 


Inspired by this, we define 


Definition 7.2.4 (Outer measure) If Q is a set, we define the outer measure m*(Q) 
of Q to be the quantity 


m*(Q) := inf Y° vol(Bj) : (Bj) jes covers Q; J at most countable 
jes 


Since YS vol(B;) is non-negative, we know that m*(Q) > 0 forall 2. However, 
it is quite possible that m*(Q) could equal +00. Note that because we are allowing 
ourselves to use a countable number of boxes, that every subset of R” has at least 
one countable cover by boxes; in fact R” itself can be covered by countably many 
translates of the unit cube (0, 1)” (how?). We will sometimes write m*(Q) instead 
of m* (2) to emphasize the fact that we are using n-dimensional outer measure. 

Note that outer measure can be defined for every single set (not just the measurable 
ones), because we can take the infimum of any non-empty set. It obeys several of the 
desired properties of a measure: 


Lemma 7.2.5 (Properties of outer measure) Outer measure has the following six 
properties: 


(v) (Empty set) The empty set 6 has outer measure m*(Y) = 0. 
(vi) (Positivity) We have 0 < m*(Q) < +00 for every measurable set Q. 
(vii) (Monotonicity) If A © B CR", then m*(A) < m*(B). 
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(viii) (Finite sub-additivity) If (Aj) jes are a finite collection of subsets of R", then 
m* (Ujes Aj) < Vics mA). 
(x) (Countable sub-additivity) If (Aj) j<es are a countable collection of subsets of 
R", then m* (je, Aj) S Djes MAY): 
(xiii) (Translation invariance) If Q is a subset of R", and x € R", then m*(x + Q) = 
m* (2). 


Proof See Exercise 7.2.1. 


The outer measure of a closed box is also what we expect: 


Proposition 7.2.6 (Outer measure of closed box) For any closed box 
B= | [lai. bj) :={(1,..-,%) € R" : x; € [a;, bj] forall 1 <i <n}, 
i=l 


we have 


m*(B) = | [(b; -— 4) 
i=1 


Proof Clearly, we can cover the closed box B = []}_,[a;, b;] by the open box 
TTj1 (a — €, b; + €) for every ¢ > 0. Thus we have 


m*(B) < vol (Me —e,bi+ ») = | [@ — a, + 2e) 


i=l i=l 
for every ¢ > 0. Taking limits as e — 0, we obtain 
m*(B) < |]; -4a)). 
i=l 
To finish the proof, we need to show that 
m*(B) > | [@; -4ai). 
i=l 
By the definition of m*(B), it suffices to show that 


n 


> vol(B;) = [ [Gi — ai) 


jet i=l 


whenever (B;) jes is a finite or countable cover of B. 
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Since B is closed and bounded, it is compact (by the Heine—Borel theorem, Theo- 
rem 1.5.7), and in particular every open cover has a finite subcover (Theorem 1.5.8). 
Thus to prove the above inequality for countable covers, it suffices to do it for finite 
covers (since if (B;) je is a finite subcover of (B;)j<, then Dojer vol(B;) will be 
greater than or equal to ies vol(B;)). 

To summarize, our goal is now to prove that 


n 


do vol(BY) = | [i — a) (7.1) 


jeJ i=l 


whenever (BY) ;<, is a finite cover of [];_,[a;, b;]; we have changed the subscript 
B; to superscript B’ because we will need the subscripts to denote components. 

To prove the inequality (7.1), we shall use induction on the dimension n. First we 
consider the base case n = 1. Here B is just a closed interval B = [a, b], and each 
box B‘/ is just an open interval BY = (a;,b;). We have to show that 


> (bj — aj) = (6-4). 


jel 


To do this we use the Riemann integral. For each j € J, let fY: R > R be the func- 
tion such that f(x) = 1 when x € (aj, b;) and f(x) = 0 otherwise. Then we 
have that f“ is Riemann integrable (because it is piecewise constant, and compactly 
supported) and 


(oe) 


[$9 =b)-a,. 


—0o 


Summing this over all 7 € J, and interchanging the integral with the finite sum, we 
have 


oo 
/ YFP = 0b; — 4). 
oe JEL jet 


But since the intervals (a;, bj) cover [a, b], we have }) j.; f(x) => 1 for all x € 
[a, b] (why?). For all other values of x, we have a, fP (x) > 0. Thus 


oe) 
fxr: / l=b-a 
-oo Jed [a,b] 


and the claim follows by combining this inequality with the previous equality. This 
proves (7.1) when n = 1. 


152 7 Lebesgue Measure 
Now assume inductively that n > 1, and we have already proven the inequality 


(7.1) for dimensions n — 1. We shall use a similar argument to the preceding one. 
Each box BY? is now of the form 


n 
BO = [[@. by, 


i=1 


We can write this as 
BY =A” x (a, bY) 


where A“ is the n — 1-dimensional box AV := [] "2; (av ) bY ). Note that 
vol(B“”) = vol,-1(AY) (bY — a) 


where we have subscripted vol,_; by n—1 to emphasize that this is n — 1- 
dimensional volume being referred to here. We similarly write 


B=A~x [aq, by] 
where A := iar [a;, b;], and again note that 
vol(B) = voly—1(A) (by — Gn). 
For each j € J, let f‘” be the function such that f (x,) = vol,_;(A™) for all 


Xn € (a - by y, and f D(x,) = 0 for all other x,,. Then f @ ig Riemann integrable 
and 


oe) 
: f® =voly-(A®)(b® — a) = vol(B) 
—0o 
and hence 
oe) 
y > vol(B) = i > f™. 
jet Sie 
Now let x, € [an, by] and (x1, ..., Xn»-1) € A. Then (x1, ..., X,) lies in B, and hence 
lies in one of the B“. Clearly we have x, € (ay, by’), and (x1, ...,Xn—1) € AW. 


In particular, we see that for each x, € [a,, b,], the set 
{AY : jf eS xn € (a, bY} 


of n — 1-dimensional boxes covers A. Applying the inductive hypothesis (7.1) at 
dimension n — | we thus see that 
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Yo voln-1 (A?) = voln1(A), 


. G) pG) 
JES:Xn (ay Dy”) 


or in other words 


Y= fF Gn) = Voln—1(A). 


jeJ 


Integrating this over [a,, b,], we obtain 


S> fP = voln—1(A) (bn — dn) = vol(B) 


[ann] SEF 


and in particular 


(oe) 


SS f > voln_1(A)(bn — Gn) = vol(B) 


-~oo Jes 


since )> jer W is always non-negative. Combining this with our previous identity 
for [> jes £\? we obtain (7.1), and the induction is complete. 


Once we obtain the measure of a closed box, the corresponding result for an open 
box is easy: 


Corollary 7.2.7 For any open box 


B= | [(ai. bi) = (Or, ....n) ER": a; € (Gi, by) forall | <i <n}, 


i=l 
we have 


m*(B) =| [(; — ai). 
i=1 


In particular, outer measure obeys the normalization (xii). 


Proof We may assume that b; > a; for alli, since if b; = a; this follows from Lemma 
7.2.5(v). Now observe that 


n 


| [lai te.bi- 1 ¢ hiGs bi) C [ll bi 


i=1 i=1 i=l 


for alle > 0, assuming that ¢ is small enough that b; — ¢ > a; + ¢ foralli. Applying 
Proposition 7.2.6 and Lemma 7.2.5(vii) we obtain 
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| [Gi - 4: — 22) < m* (Ie. 6) < | [@ - 4). 
i=] i=l i=1 


Sending ¢ — 0 and using the squeeze test (Corollary 6.4.14), one obtains the result. 


We now compute some examples of outer measure on the real line R. 


Example 7.2.8 Letus compute the one-dimensional measure of R. Since (—R, R) C 
R for all R > 0, we have 


m*(R) = m*((—R, R)) =2R 


by Corollary 7.2.7. Letting R — +00 we thus see that m*(R) = +o. 


Example 7.2.9 Now let us compute the one-dimensional measure of Q. From Propo- 
sition 7.2.6 we see that for each rational number Q, the point {g} has outer measure 
m*({q}) = 0. Since Q is clearly the union Q = LJ F <Q} Of all these rational points 
q, and Q is countable, we have 


m*(Q) < ))m*({q}) = })0=0, 
qeQ qeQ 


and so m*(Q) must equal zero. In fact, the same argument shows that every countable 
set has measure zero. (This, incidentally, gives another proof that the real numbers 
are uncountable, Corollary 8.3.4.) 


Remark 7.2.10 One consequence of the fact that m*(Q) = 0 is that given any ¢ > 0, 
it is possible to cover the rationals Q by a countable number of intervals whose total 
length is less than e. This fact is somewhat un-intuitive; can you find a more explicit 
way to construct such a countable covering of Q by short intervals? 


Example 7.2.11 Now let us compute the one-dimensional measure of the irrationals 
R\Q. From finite sub-additivity we have 


m*(R) < m*(R\Q) + m*(Q). 


Since Q has outer measure 0, and m*(R) has outer measure +00, we thus see that the 
irrationals R\Q have outer measure +-oo. A similar argument shows that [0, 1]\Q, 
the irrationals in [0, 1], have outer measure | (why?). 


Example 7.2.12 By Proposition 7.2.6, the unit interval [0,1] in R_ has 
one-dimensional outer measure 1, but the unit interval {(x,0):0 <x < 1} in R? 
has two-dimensional outer measure 0. Thus one-dimensional outer measure and 
two-dimensional outer measure are quite different. Note that the above remarks and 
countable sub-additivity imply that the entire x-axis of R? has two-dimensional outer 
measure 0, despite the fact that R has infinite one-dimensional measure. 
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— Exercises — 


Exercise 7.2.1 Prove Lemma 7.2.5. (Hint: you will have to use the definition of inf, 
and probably introduce a parameter ¢. You may have to treat separately the cases 
when certain outer measures are equal to +00. (viii) can be deduced from (x) and 
(v). For (x), label the index set J as J = {jj, jo, j3,...}, and for each Aj, pick a 
covering of A; by boxes whose total volume is no larger than m*(A;) + €/2/.) 


Exercise 7.2.2 Let A be a subset of R”, and let B be a subset of R”. Note 
that the Cartesian product {(a, b) : a € A, b € B} is then a subset of R"*”. Show 
that m7,,,(A x B) < mj(A)m;, (B). Here we adopt the convention that c x +00 = 
+oo x c is equal to +00 for any 0 < c < +00 < and equal to zero for c = 0. (It 
is in fact true that m*__,(A x B) = m*(A)m* (B), but this is substantially harder to 


n+m 
prove.) 


In Exercises 7.2.3—7.2.5, we assume that R” is a Euclidean space, and we have a 
notion of measurable set in R” (which may or may not coincide with the notion of 
Lebesgue measurable set) and a notion of measure (which may or may not coincide 
with Lebesgue measure) which obeys axioms (i)—(xiii). 


Exercise 7.2.3 (a) Show that if A; C Az C A3... is an increasing sequence of 
measurable sets (so A; C Aj+; for every positive integer j), then we have 
m U1 Aj => lim joo m(Aj). 

(b) Show that if A; > Az D> A3... is a decreasing sequence of measurable sets 
(so A; > Aj+1 for every positive integer j), and m(A,) < +00, then we have 


m (M1 Aj) = limj- oo m(A)). 


Exercise 7.2.4 Show that for any positive integer g > 1, that the open box 
(0, 1/q)" :={(@1,..-,%n) € R" :0 < x; < 1/q forall l < j <n} 
and the closed box 
[0, 1/q]" :={(@1,...,%) € R":0< x; < 1/q forall l <j <n} 


both measure g~”. (Hint: first show that m((0, 1/q)") < q~” for every g > 1 by 
covering (0, 1)” by some translates of (0, 1/q)”. Using a similar argument, show that 
m([0, 1/g]") => q~". Then show that m([0, 1/q]"\(0, 1/¢)”) < © for every e > 0, 
by covering the boundary of [0, 1/q]" with some very small boxes.) 


Exercise 7.2.5 Show that for any box B, that m(B) = vol(B). (Hint: first prove 
this when the co-ordinates a;, b; are rational, using Exercise 7.2.4. Then take limits 
somehow (perhaps using Q1) to obtain the general case when the co-ordinates are 
real.) 


Exercise 7.2.6 Use Lemma 7.2.5 and Proposition 7.2.6 to furnish another proof that 
the reals are uncountable (i.e., reprove Corollary 8.3.4 from Analysis I). 
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7.3 Outer Measure Is not Additive 


In light of Lemma 7.2.5, it would seem now that all we need to do is to verify the addi- 
tivity properties (ix), (xi), and we have everything we need to have a usable measure. 
Unfortunately, these properties fail for outer measure, even in one dimension. 


Proposition 7.3.1 (Failure of countable additivity) There exists a countable collec- 
tion (Aj) jes of disjoint subsets of R, such that m*(Ujey A))# pare, m*(Aj). 


Proof We shall need some notation. Let Q be the rationals, and R be the reals. We 
say that a set A C R is a coset of Q if it is of the form A = x + Q for some real 
number x. For instance, /2 + Q is a coset of Q, as is Q itself, since Q=0+4+Q. 
Note that a coset A can correspond to several values of x; for instance 2 + Q is 
exactly the same coset as 0 + Q. Also observe that it is not possible for two cosets to 
partially overlap; if x + Q intersects y + Q in even just a single point z, then x — y 
must be rational (why? Use the identity x — y = (x — z) — (y — z)), andthusx +Q 
and y + Q must be equal (why?). So any two cosets are either identical or disjoint. 

We observe that every coset A of the rationals Q has a non-empty intersection 
with [0, 1]. Indeed, if A is a coset, then A = x + Q for some real number x. If we 
then pick a rational number g in [—x, 1 — x] then we see that x + q € [0, 1], and 
thus AM [0, 1] contains x + q. 

Let R/Q denote the set of all cosets of Q; note that this is a set whose elements are 
themselves sets (of real numbers). For each coset A in R/Q, let us pick an element 
x, of AM [0, 1]. (This requires us to make an infinite number of choices, and thus 
requires the axiom of choice, see Sect.8.4.) Let E be the set of all such x4, 1.e., 
E:={x,4: A € R/Q}. Note that E C [0, 1] by construction. 

Now consider the set 

x= |) @+B®). 


geQ-1.) 


Clearly this set is contained in [—1, 2] (since g + x € [—1, 2] whenever g € [—1, 1] 
and x € E C [0, 1]). We claim that this set contains the interval [0, 1]. Indeed, for 
any y € [0, 1], we know that y must belong to some coset A (for instance, it belongs 
to the coset y + Q). But we also have x4 belonging to the same coset, and thus 
y — xa is equal to some rational q. Since y and x, both live in [0, 1], then g lives in 
[—1, 1]. Since y = q + x4, we have y € g + E, and hence y € X as desired. 

Note that the translates g + E for g € Q are all disjoint. For, if there were two 
distinct g, gq’ € Qwithg + E intersecting g’ + E, then there would be A, A’ € R/Q 
such that g +x4 = q’+x,. Butthen A = x4 + Q= x4 + Q= A’ and thus x4 = 
x, which implies that g = q’, contradicting the hypothesis. 

We claim that 

m*(X)# \ m*q+E), 


geQn{-1,.] 


which would prove the claim. To see why this is true, observe that since [0, 1] C 
X C [-1, 2], that we have 1 < m*(X) < 3 by monotonicity and Proposition 7.2.6. 
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For the right-hand side, observe from translation invariance that 
~ m*(q+E)= a m*(E). 
qeQn{-1,1] qeQ{[-1,1] 


The set QM [—1, 1] is countably infinite (why?). Thus the right-hand side is either 
0 Gif m*(E) = 0) or +00 (if m*(E) > 0). Either way, it cannot be between | and 3, 
and the claim follows. 


Remark 7.3.2. The above proof used the axiom of choice. This turns out to be abso- 
lutely necessary; one can prove using some advanced techniques in mathematical 
logic that if one does not assume the axiom of choice, then it is possible to have a 
mathematical model where outer measure is countably additive. 


One can refine the above argument, and show in fact that m* is not finitely additive 
either: 


Proposition 7.3.3 (Failure of finite additivity) There exists a finite collection (Aj) jes 
of disjoint subsets of R, such that 


m* 4; # y\m*(Aj). 


jes jes 


Proof This is accomplished by an indirect argument. Suppose for sake of contradic- 
tion that m* was finitely additive. Let E and X be the sets introduced in Proposition 
7.3.1. From countable sub-additivity and translation invariance we have 


m(X)< Di mqt+E)= Dd) me). 


geQn[-1.1] geQn{-1,1] 


Since we know that 1 < m*(X) < 3, we thus have m*(E) 4 0, since otherwise we 
would have m*(X) < 0, a contradiction. 

Since m*(E) # 0, there exists a finite integer n > O such that m*(E) > 1/n. Now 
let J be a finite subset of QM [—1, 1] of cardinality 3n. If m* were finitely additive, 
then we would have 


1 
m*|(Jqt+E| => m*q@t+£) =) m*(E) > 3n- =3. 
qed qed qed Ht 


But we know that J ger 7 + E 1s a subset of X, which has outer measure at most 3. 
This contradicts monotonicity. Hence m* cannot be finitely additive. 


Remark 7.3.4 The examples here are related to the Banach-Tarski paradox, which 
demonstrates (using the axiom of choice) that one can partition the unit ball in R? 
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into a finite number of pieces which, when rotated and translated, can be reassembled 
to form two complete unit balls! Of course, this partition involves non-measurable 
sets. We will not present this paradox here as it requires some group theory which is 
beyond the scope of this text. 


7.4 Measurable Sets 


In the previous section we saw that certain sets were badly behaved with respect 
to outer measure, in particular they could be used to contradict finite or countable 
additivity. However, those sets were rather pathological, being constructed using the 
axiom of choice and looking rather artificial. One would hope to be able to exclude 
them and then somehow recover finite and countable additivity. Fortunately, this can 
be done, thanks to a clever definition of Constantin Carathéodory (1873-1950): 


Definition 7.4.1 (Lebesgue measurability) Let E be a subset of R”. We say that E 
is Lebesgue measurable, or measurable for short, iff we have the identity 


m*(A) = m*(AN E) +m*(A\E) 


for every subset A of R”. If E is measurable, we define the Lebesgue measure of E 
to be m(E) = m*(E); if E is not measurable, we leave m(E) undefined. 


In other words, E being measurable means that if we use the set EF to divide up an 
arbitrary set A into two parts, we keep the additivity property. Of course, if m* were 
finitely additive then every set E would be measurable; but we know from Proposition 
7.3.3 that not every set is finitely additive. One can think of the measurable sets as 
the sets for which finite additivity works. We sometimes subscript m(E) as m,(E) 
to emphasize the fact that we are using n-dimensional Lebesgue measure. 

The above definition is somewhat hard to work with, and in practice one does 
not verify a set is measurable directly from this definition. Instead, we will use this 
definition to prove various useful properties of measurable sets (Lemmas 7.4.2— 
7.4.11), and after that we will rely more or less exclusively on the properties in those 
lemmas, and no longer need to refer to the above definition. 

We begin by showing that a large number of sets are indeed measurable. The 
empty set E = J and the whole space E = R” are clearly measurable (why?). Here 
is another example of a measurable set: 


Lemma 7.4.2 (Half-spaces are measurable) The half-space 
{(X],...,X%,) € R” : x, > O} 


is measurable. 


Proof See Exercise 7.4.3. 
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Remark 7.4.3, A similar argument will also show that any half-space of the form 
{(41,.-.,%n) € R" 2 x; > Ofor{(x,...,%n) € R" : x; < 0} forsome 1 < j < nis 
measurable. 


Now for some more properties of measurable sets. 


Lemma 7.4.4 (Properties of measurable sets) 


(a) If E is measurable, then R"\E is also measurable. 

(b) (Translation invariance) If E is measurable, and x € R", then x + E is also 
measurable, and m(x + E) = m(E). 

(c) If E, and E> are measurable, then E, ) Ez and E; U E> are measurable. 

(d) (Boolean algebra property) If E,, Ex,..., Ey are measurable, then oe Ej 


and are E; are measurable. 
(e) Every open box, and every closed box, is measurable. 
(f) Any set E of outer measure zero (i.e., m*(E) = 0) is measurable. 


Proof See Exercise 7.4.4. 


~ 


From Lemma 7.4.4, we have proven properties (ii), (iii), (xiii) on our wish list of 
measurable sets, and we are making progress toward (i). We also have finite additivity 
(property (ix) on our wish list): 


Lemma 7.4.5 (Finite additivity) If (Ej) jes are a finite collection of disjoint mea- 
surable sets, then for any set A (not necessarily measurable), we have 


m* ANUE; =) 'm*(AN Ej). 


jel jes 


Furthermore, we have m (ies E;) = Vices M(E;). 


Proof See Exercise 7.4.6. 


Remark 7.4.6 Lemma 7.4.5 and Proposition 7.3.3, when combined, imply that there 
exist non-measurable sets: see Exercise 7.4.5. 


Corollary 7.4.7 [If A C B are two measurable sets, then B\A is also measurable, 


and 
m(B\ A) + m(A) = m(B). 


Proof See Exercise 7.4.7. 
Now we show countable additivity. 


Lemma 7.4.8 (Countable additivity) If (Ej) j<s are a countable collection of dis- 
joint measurable sets, then Vier E; is measurable, and m (Ujes Ej) = 


ee, m(E;). 
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Proof Let E:= je £j- Our first task will be to show that E is measurable. Thus, 
let A be an arbitrary set (not necessarily measurable); we need to show that 


m*(A) = m*(AN E) +m*(A\E). 


Since J is countable, we may write J = {jj, jo, j3,...-}. Note that 


CO 
AnE= [Jian Ej,) 
k=1 


(why?) and hence by countable sub-additivity 
[o.e) 
m*(AN E) < yi m*(A N E;,)- 
k=1 


We rewrite this as 
N 


m*(AN E) < sup )\m*(AN Ej). 
N21 po] 


Let Fy be the set Fy := ae Ej. Since the AM EF), are all disjoint, and their 
union is AM Fy, we see from Lemma 7.4.5 that 


N 
> m*(AN Ej.) = m*(AN Fy) 
k=l 

and hence 
m*(AN E) < supm*(AN Fy). 
N>1 


Now we look at A\E. Since Fy C E (why?), we have A\E C A\Fy (why?). By 
monotonicity, we thus have 


m*(A\E) < m*(A\Fy) 
for all N. In particular, we see that 
m*(AN E)+m*(A\E) < sup (m*(A N Fy) + m*(A\E)) 
N>1 


< sup (m*(AN Fy) + m*(A\Fy)). 
N>1 


But from Lemma 7.4.4(d) we know that Fy is measurable, and hence 


m*(AN Fy) +m*(A\Fy) = m*(A). 
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Putting this all together we obtain 

m*(AN E)+m*(A\E) < m*(A). 
But from finite sub-additivity we have 

m*(AN E)+m*(A\E) = m*(A) 


and the claim follows. This shows that E is measurable. 
To finish the lemma, we need to show that m(£) is equal to )~ jer m(E ;). We first 
observe from countable sub-additivity that 


m(E) < y > m(Ej) = mE j,)- 
jel k=1 


On the other hand, by finite additivity and monotonicity we have 


N 
m(E) > m(Fy) =) m(E;,). 
k=1 
Taking limits as N — oo we obtain 
o.e) 
m(E) >) m(E;,) 


k=1 


and thus we have 


m(E) =) m(Ej,) =) m(E;) 
k=1 


jes 


as desired. 


This proves property (xi) on our wish list. Next, we do countable unions and 
intersections. 


Lemma 7.4.9 (c-algebra property) /f (Qj) j<7 are any countable collection of mea- 
surable sets (so J is countable), then the union \),., Qj and the intersection 


jes 
Dies QQ; are also measurable. 


Proof See Exercise 7.4.8. 


The final property left to verify on our wish list is (a). We first need a preliminary 
lemma. 


Lemma 7.4.10 Every open set can be written as a countable or finite union of open 
boxes. 
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Proof We first need some notation. Call a box B = []j_, (ai, b;) rational if all of 
its components a;, b; are rational numbers. Observe that there are only a countable 
number of rational boxes (this is since a rational box is described by 27 rational 
numbers, and so has the same cardinality as Q?”. But Q is countable, and the Cartesian 
product of any finite number of countable sets is countable; see Corollaries 8.1.14, 
8.1.15). 

We make the following claim: given any open ball B(x, r), there exists a rational 
box B which is contained in B(x, r) and which contains x. To prove this claim, write 
XxX = (X),...,X,). For each 1 <i <n, let a; and b; be rational numbers such that 


r r 
Xp--— <4 <x <b <x, +-. 
n n 


Then it is clear that the box THe: (a;, bj) is rational and contains x. A simple com- 
putation using Pythagoras’ theorem (or the triangle inequality) also shows that this 
box is contained in B(x, 1r); we leave this to the reader. 

Now let E be an open set, and let & be the set of all rational boxes B which are 
subsets of £, and consider the union U pex B ofall those boxes. Clearly, this union is 
contained in EF, since every box in & is contained in E by construction. On the other 
hand, since E is open, we see that for every x € E there is a ball B(x, r) contained 
in E, and by the previous claim this ball contains a rational box which contains x. 
In particular, x is contained in ),,5, B. Thus we have 


E=|JB 
Bex 


as desired; note that © is countable or finite because it is a subset of the set of all 
rational boxes, which is countable. 


Lemma 7.4.11 (Borel property) Every open set, and every closed set, is Lebesgue 
measurable. 


Proof It suffices to do this for open sets, since the claim for closed sets then follows 
by Lemma 7.4.4(a) (i.e., property (11)). Let E be an open set. By Lemma 7.4.10, E 
is the countable union of boxes. Since we already know that boxes are measurable, 
and that the countable union of measurable sets is measurable, the claim follows. 


The construction of Lebesgue measure and its basic properties are now complete. 
Now we make the next step in constructing the Lebesgue integral—describing the 
class of functions we can integrate. 


— Exercises — 


Exercise 7.4.1 If A is an open interval in R, show that m*(A) = m*(A NM (0, 00)) + 
m*(A\(0, co)). 

Exercise 7.4.2 If Ais an open box in R", and EF is the half-plane E := {(x1,...,Xn) € 
R" : x, > O}, show that m*(A) = m*(AN E)+m*(A\E). (Hint: use Exercise 
7.4.1.) 
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Exercise 7.4.3 Prove Lemma 7.4.2. (Hint: use Exercise 7.4.2.) 


Exercise 7.4.4 Prove Lemma 7.4.4. (Hints: for (c), first prove that 
m*(A) = m*(AN E10 Er) + m*(AN E|\ Eo) + m*(AN E\E}) + m*(A\(E] U E))). 


A Venn diagram may be helpful. Also you may need the finite sub-additivity property. 
Use (c) to prove (d), and use (bd) and the various versions of Lemma 7.4.2 to prove 


(e)). 


Exercise 7.4.5 Show that the set E used in the proof of Propositions 7.3.1 and 7.3.3 
is non-measurable. 


Exercise 7.4.6 Prove Lemma 7.4.5. 
Exercise 7.4.7 Use Lemma 7.4.5 to prove Corollary 7.4.7. 


Exercise 7.4.8 Prove Lemma 7.4.9. (Hint: for the countable union problem, write 
J = {ii, jo,...}, write Fy = One Q;,,and write Ey := Fy \Fy—1, with the under- 
standing that Fo is the empty set. Then apply Lemma 7.4.8. For the countable inter- 
section problem, use what you just did and Lemma 7.4.4(a).) 


Exercise 7.4.9 Let A C R? be the set A:=[0, 1]?\Q’; i.e., A consists of all the 
points (x, y) in [0, 1]? such that x and y are not both rational. Show that A is 
measurable and m(A) = 1, but that A has no interior points. (Hint: it’s easier to use 
the properties of outer measure and measure, including those in the exercises above, 
than to try to do this problem from first principles.) 


Exercise 7.4.10 Let A C B CR". Show that if B is Lebesgue measurable with 
measure zero, then A is also Lebesgue measurable with measure zero. 


7.5 Measurable Functions 


In the theory of the Riemann integral, we are only able to integrate a certain class 
of functions—the Riemann integrable functions. We will now be able to integrate a 
much larger range of functions—the measurable functions. More precisely, we can 
only integrate those measurable functions which are absolutely integrable—but more 
on that later. 


Definition 7.5.1 (Measurable functions) Let Q be a measurable subset of R”, and 
let f : Q > R" bea function. A function f is measurable iff f—'(V) is measurable 
for every open set V C R”. 


As discussed earlier, most sets that we deal with in real life are measurable, so it is 
only natural to learn that most functions we deal with in real life are also measurable. 
For instance, continuous functions are automatically measurable: 
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Lemma 7.5.2 (Continuous functions are measurable) Let Q be a measurable subset 
of R", and let f :  — R” be continuous. Then f is also measurable. 


Proof Let V be any open subset of R”. Then since f is continuous, f~!(V) is 
open relative to Q (see Theorem 2.1.5(c)), i.e., f~-'(V) = WN @ for some open set 
W CR’ (see Proposition 1.3.4(a)). Since W is open, it is measurable; since Q is 
measurable, W M Q is also measurable. 


Because of Lemma 7.4.10, we have an easy criterion to test whether a function is 
measurable or not: 


Lemma 7.5.3 Let Q be a measurable subset of R", and let f : Q — R"” be a func- 
tion. Then f is measurable if and only if f~'(B) is measurable for every open box 
B. 


Proof See Exercise 7.5.1. 


Corollary 7.5.4 Let Q be a measurable subset of R", and let f: Q—+ R"” bea 
function. Suppose that f = (fi,.-., fm), where fj: 8 — R is the jth co-ordinate 
of f. Then f is measurable if and only if all of the f; are individually measurable. 


Proof See Exercise 7.5.2. 


Unfortunately, it is not true that the composition of two measurable functions 
is automatically measurable; however we can do the next best thing: a continuous 
function applied to a measurable function is measurable. 


Lemma 7.5.5 Let Q be a measurable subset of R", and let W be an open subset of 
R”. Tf f : Q — W is measurable, and g: W — R? is continuous, then go f : Q—> 
R? is measurable. 


Proof See Exercise 7.5.3. 


This has an immediate corollary: 


Corollary 7.5.6 Let Q be a measurable subset of R". If f : Q — Ris ameasurable 
function, then so is | f |, max(f, 0), and min(f, 0). 


Proof Apply Lemma 7.5.5 with g(x):=|x|, g(x):= max(x,0), and g(x):= 
min(x, 0). 


A slightly less immediate corollary: 


Corollary 7.5.7 Let Q be a measurable subset of R". If f : Q—> Randg: Q2>R 
are measurable functions, then so is f + g, f — g, fg, max(f, g), and min(f, g). 
If g(x) £0 for all x € Q, then f/g is also measurable. 


7.5 Measurable Functions 165 


Proof Consider f + g. We can write this as k o h, where h: Q — R? is the function 
h(x) = (f(x), g(x)), andk : R? = Risthe functionk(a, b) :=a + b. Since f, gare 
measurable, then / is also measurable by Corollary 7.5.4. Since k is continuous, we 
thus see from Lemma 7.5.5 that k o h is measurable, as desired. A similar argument 
deals with all the other cases; the only thing concerning the f/g case is that the 
space R? must be replaced with {(a, b) € R* : b £ 0} in order to keep the map 
(a, b) + a/b continuous and well-defined. 


Another characterization of measurable functions is given by 


Lemma 7.5.8 Let Q be a measurable subset of R", and let f : Q — R be a function. 
Then f is measurable if and only if f ~'((a, 00)) is measurable for every real number 
a. 


Proof See Exercise 7.5.4. 


Inspired by this lemma, we extend the notion of a measurable function to the 
extended real number system R* := R U {+00} U {—oo}: 


Definition 7.5.9 (Measurable functions in the extended reals) Let 2 be a measurable 
subset of R”. A function f: Q — R* is said to be measurable iff f—'((a, +00]) is 
measurable for every real number a. 


Note that Lemma 7.5.8 ensures that the notion of measurability for functions 
taking values in the extended reals R* is compatible with that for functions taking 
values in just the reals R. 

Measurability behaves well with respect to limits: 


Lemma 7.5.10 (Limits of measurable functions are measurable) Let Q be a mea- 
surable subset of R". For each positive integer n, let f, : {2 — R* be a measurable 
function. Then the functions sup, +, fn, Mfn>1 fn, lim sup, _,., fn, and lim infp.oo fn 
are also measurable. In particular, if the f, converge pointwise to another function 
f: Q— R*, then f is also measurable. 


Proof We first prove the claim about sup,,., f,. Call this function g. We have to prove 
that g~'((a, +00]) is measurable for every a. But by the definition of supremum, we 
have 

g'((a, +00]) =) f(a, +00) 


n>1 


(why?), and the claim follows since the countable union of measurable sets is again 
measurable. 

A similar argument works for inf,>; f,. The claim for lim sup and lim inf then 
follow from the identities 


lim sup f, = inf sup f, 
noo N21 n>N 
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and 
liminf f, = sup inf f, 
noo N>1[NZN 


(see Definition 6.4.6). 


As you can see, just about anything one does to a measurable function will pro- 
duce another measurable function. This is basically why almost every function one 
deals with in mathematics is measurable. (Indeed, the only way to construct non- 
measurable functions is via artificial means such as invoking the axiom of choice.) 


— Exercises — 


Exercise 7.5.1 Prove Lemma 7.5.3. (Hint: use Lemma 7.4.10 and the o-algebra 
property.) 


Exercise 7.5.2 Use Lemma 7.5.3 to deduce Corollary 7.5.4. 
Exercise 7.5.3 Prove Lemma 7.5.5. 


Exercise 7.5.4 Prove Lemma 7.5.8. (Hint: use Lemma 7.5.3. As a preliminary step, 
you may need to show that if f~!((a, 00)) is measurable for all a, then f~!({a, 00)) 
is also measurable for all a.) 


Exercise 7.5.5 Let f: R” — R be Lebesgue measurable, and let g: R” — R bea 
function which agrees with f outside of a set of measure zero, thus there exists a set 
A CR" of measure zero such that f(x) = g(x) for all x € R”\A. Show that g is 
also Lebesgue measurable. (Hint: use Exercise 7.4.10.) 


Chapter 8 M®) 
Lebesgue Integration rie 


In Chap. 11, we approached the Riemann integral by first integrating a particularly 
simple class of functions, namely the piecewise constant functions. Among other 
things, piecewise constant functions only attain a finite number of values (as opposed 
to most functions in real life, which can take an infinite number of values). Once one 
learns how to integrate piecewise constant functions, one can then integrate other 
Riemann integrable functions by a similar procedure. 

We shall use a similar philosophy to construct the Lebesgue integral. We shall 
begin by considering a special subclass of measurable functions—the simple func- 
tions. Then we will show how to integrate simple functions, and then from there we 
will integrate all measurable functions (or at least the absolutely integrable ones). 


8.1 Simple Functions 


Definition 8.1.1 (Simple functions) Let Q be a measurable subset of R”, and let 
f: Q— R be a measurable function. We say that f is a simple function if the 
image f({2) is finite. In other words, there exists a finite number of real numbers 
C1, €2,...,¢y such that for every x € Q, we have f(x) = cj; forsome 1 <j < N. 


Example 8.1.2. Let Q be a measurable subset of R”, and let E be a measurable 
subset of 82. We define the characteristic function xz : Q —> Rby setting xz (x):=1 
ifx € E,and xyz(x):=O0ifx ¢ E. (Insome texts, x is also written 1, and is referred 
to as an indicator function.) Then x, is a measurable function (why?) and is a simple 
function, because the image x ¢({2) is {0, 1} (or {0} if E is empty, or {1} if E = Q). 


We remark on three basic properties of simple functions: that they form a vec- 
tor space, that they are linear combinations of characteristic functions, and that they 
approximate measurable functions. More precisely, we have the following three lem- 
mas: 
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Lemma 8.1.3 Let Q be a measurable subset of R", and let f: Q > Rand g: Q—> 
R be simple functions. Then f + g is also a simple function. Also, for any scalar 
c ER, the function cf is also a simple function. 


Proof See Exercise 8.1.1. 


Lemma 8.1.4 Let Q be a measurable subset of R", and let f : Q — R be a simple 
function. Then there exists a finite number of real numbers c,,..., Cn, and a finite 
number of disjoint measurable sets E,, Ex, ..., En in Q, such that f = see Ci XE; 


Proof See Exercise 8.1.2. 


Lemma 8.1.5 Let Q be a measurable subset of R", and let f : Q — [0, +00] bea 
measurable function. Then there exists a sequence f\, fo, f3, ... of simple functions, 
Jn: Q — R, such that the f, are non-negative and increasing, 

0< fi) < fp®) < fp) <... forallx €Q 


and converge pointwise to f : 


lim fi(x) = f(x) forallx €Q. 


Proof See Exercise 8.1.3. 
We now show how to compute the integral of simple functions. 


Definition 8.1.6 (Lebesgue integral of simple functions) Let Q be a measurable 
subset of R”, and let f : & — R bea simple function which is non-negative; thus f 
is measurable and the image f ({2) is finite and contained in [0, 00). We then define 
the Lebesgue integral J, f of f on Q by 


ie Y> Am({x € Q: f(x) =A). 
Q 


AE f (Q);A>0 


We will also sometimes write Te fas te Ff dm (to emphasize the réle of Lebesgue 
measure m) or use a dummy variable such as x, e.g., te f(x) dx. 


Example 8.1.7 Let f : R — R be the function which equals 3 on the interval [1, 2], 
equals 4 on the interval (2, 4), and is zero everywhere else. Then 

|i x m([1, 2]) +4 x m((2,4)) =3 x 1+4x2=11. 

Q 


Or if g: R — R is the function which equals | on [0, oo) and is zero everywhere 
else, then 
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J ¢ = 1x m0, 009) = 1 x +00 = +00. 
Q 


Thus the simple integral of a simple function can equal +oo. (The reason why 
we restrict this integral to non-negative functions is to avoid ever encountering the 
indefinite form +00 + (—oo).) 


Remark 8.1.8 Note that this definition of integral corresponds to one’s intuitive 
notion of integration (at least of non-negative functions) as the area under the graph 
of the function (or volume, if one is in higher dimensions). 


Another formulation of the integral for non-negative simple functions is as follows. 


Lemma 8.1.9 Let Q be a measurable subset of R", and let E\,..., Ey br a finite 
number of disjoint measurable subsets in 2. Let c,, ..., Cy be non-negative numbers 
(not necessarily distinct). Then we have 


N N 
[Vere = Y > cjm(E)). 


oJ! j=! 


Proof We can assume that none of the c; are zero, since we can just remove them 
from the sum on both sides of the equation. Let f:= ye CjXe;- Then f (x) is either 


equal to one of the c; (ifx € E;) or equal to 0 Gif x ¢ ener E;). Thus f is a simple 
function, and f (8&2) € {0} U {c; : 1 < j < N}. Thus, by the definition, 


[ee DJ Am({x € Qs f@) =ay) 


Q AE{ej:1<j<N} 
= >> Am Ll ey 
AE{ej:1<j<N} l<j<N:cj=r 


But by the finite additivity property of Lebesgue measure, this is equal to 


Soe SE mE 


Ae{ej:1<j<N}  1<j<Nicj=A 


aye S> cjm(E)). 


Ae{cj:1<j<N} 1<j<Nicj=a 


Each j appears exactly once in this sum, since c; is only equal to exactly one value 
of 4. So the above expression is equal to Pay cjm(E;) as desired. 


Some basic properties of Lebesgue integration of non-negative simple functions: 


Proposition 8.1.10 Let Q be a measurable set, and let f: Q > Randg:Q—>R 
be non-negative simple functions. 
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(a) We haveO < tes f < ©. Furthermore, we have = f =0 ifand only if m({x € 
QQ: f(x) F O}) = 0. 

(b) We have ta = lof t+Iles- 

(c) For any positive number c, we have V5 cf = Cie fs 


(d) If f(x) < g(x) forall x € Q, then we have es f< Ie g. 


We make a very convenient notational convention: if a property P(x) holds for 
all points in &2, except for a set of measure zero, then we say that P holds for almost 
every point in 2. Thus (a) asserts that Ta f = Oif and only if f is zero for almost 
every point in Q2. 


Proof From Lemma 8.1.4 or from the formula 


f= > AX (xeQ: f=) 


Ae f (2)\{0} 


we can write f as a combination of characteristic functions, say 


N 
PS) Gian; 
j=l 


where F),..., Ey are disjoint subsets of Q and the c; are positive. Similarly we can 
write 
M 
§ = So dkxr, 
k=1 
where F|,..., Fy are disjoint subsets of Q and the d, are positive. 


(a) Since de f= aan c;m(E;) itis clear that the integral is between 0 and infinity. 
If f is zero almost everywhere, then all of the E; must have measure zero (why?) 
and so {,, f = 0. Conversely, if {, f = 0, then a cj;m(E;) = 0, which can 
only happen when all of the m(£;) are zero (since all the c; are positive). But 
then Ue E; has measure zero, and hence f is zero almost everywhere in Q. 


(b) Write Eo:=Q2\ Us E; and cg:=0, then we have 2 = Ey U E, U...U Ey and 


N 
f= Ycjxz;- 
j=0 


Similarly if we write Fo:=Q\ Le F,, and do:=0 then 


M 
&= So adkxK- 
k=0 
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Since Q = Ey U...U Ey = Fo U...U Fy, we have 


N M 
f= as Yo ejxenn 


j=0 k=0 
and 
M N 
&= > ~ dk XE AF; 
k=0 j=0 
and hence 
fte= Do © +a)xenK,. 


0<j<N;0<k<M 
By Lemma 8.1.9, we thus have 


[oeto= Yo G+dymejn ry. 
Q 


0<j<N;0<k<M 


On the other hand, we have 


ge Yo cm(Ej)= >> cjm(E;N Fi) 
Q 


O<j<N 0<j<N;0<k<M 


and similarly 


[e- > dm (Fx) = > dym(E; 0 Fr) 
Q 


0<k<M 0<j<N;0<k<M 


and the claim (b) follows. 

(c) Since cf = yi cexep we have focf = pe ccjm(E;). Since f, f = 
yA cjm(E;), the claim follows. 

(d) Write h:=g — f. Then h is simple and non-negative and g = f +h, hence by 
(b) we have fg = /, f + Jf. But by (a) we have fh = 0, and the claim 
follows. 


— Exercise — 
Exercise 8.1.1 Prove Lemma 8.1.3. 
Exercise 8.1.2 Prove Lemma 8.1.4. 


Exercise 8.1.3 Prove Lemma 8.1.5. (Hint: set 
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a J ; n 
In) SUP td 2a 57 SO 
, Le., f(x) is the greatest integer multiple of 2~” which does not exceed either f (x) 
or 2”. You may wish to draw a picture to see how /f}, fo, fs, etc., works. Then prove 


that f, obeys all the required properties.) 


8.2 Integration of Non-negative Measurable Functions 


We now pass from the integration of non-negative simple functions to the integration 
of non-negative measurable functions. We will allow our measurable functions to 
take the value of +00 sometimes. 


Definition 8.2.1 (Majorization) Let f: Q > R and g: Q — R be functions. We 
say that f majorizes g, or g minorizes f , if we have f(x) > g(x) forall x € Q. 


We sometimes use the phrase “f dominates g” instead of “f majorizes g”’. 


Definition 8.2.2 (Lebesgue integral for non-negative functions) Let Q be a measur- 
able subset of R”, and let f: 2 — [0, co] be measurable and non-negative. Then 
we define the Lebesgue integral [, f of f on Q to be 


/ f:=sup / s : s 1s simple and non-negative, and minorizes f 
Q Q 


Remark 8.2.3 The reader should compare this notion to that of a lower Riemann 
integral from Definition 11.3.2. Interestingly, we will not need to match this lower 
integral with an upper integral here. 


Remark 8.2.4 Note that if Q’ is any measurable subset of Q, then we can define 
Ja f as well by restricting f to 9’, thus f,, f:= fo, fla- 


We have to check that this definition is consistent with our previous notion of 
Lebesgue integral for non-negative simple functions; in other words, if f: Q > R 
is a non-negative simple function, then the value of ihe f given by this definition 
should be the same as the one given in the previous definition. But this is clear 
because f certainly minorizes itself, and any other non-negative simple function s 
which minorizes f will have an integral tes s less than or equal to je f, thanks to 
Proposition 8.1.10(d). 


Remark 8.2.5 Note that te f is always at least 0, since 0 is simple, non-negative, 
and minorizes f. Of course, is f could equal +-oo. 


Some basic properties of the Lebesgue integral on non-negative measurable func- 
tions (which supercede Proposition 8.1.10): 
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Proposition 8.2.6 Let Q be a measurable set, and let f : Q — [0, co] and g: Q—> 
[0, co] be non-negative measurable functions. 


(a) Wehave0 < i= f < ©. Furthermore, we have i= f = Oifand only if f(x) =0 
for almost every x € Q. 

(b) For any positive number c, we have Jes cf = Cie f: 

(c) If f(x) < g(x) forall x € Q, then we have Je f< des g. 

(d) If f(x) = g(x) for almost every x € Q, then ts f= Va g. 

(e) If 2’ © Qis measurable, then fy, f = fo fxa <Jof.- 


Proof See Exercise 8.2.1. 


Remark 8.2.7 Proposition 8.2.6(d) is quite interesting; it says that one can modify 
the values of a function on any measure zero set (e.g., you can modify a function on 
every rational number), and not affect its integral at all. It is as if no individual point, 
or even a measure zero collection of points, has any “vote” in what the integral of a 
function should be; only the collective set of points has an influence on an integral. 


Remark 8.2.8 Note that we do not yet try to interchange sums and integrals. From 
the definition itis fairly easy to prove that [,(f + ¢) = Jo f + Jo g Exercise 8.2.2), 
but to prove equality requires more work and will be done later. 


As we have seen in previous chapters, we cannot always interchange an integral 
with a limit (or with limit-like concepts such as supremum). However, with the 
Lebesgue integral it is possible to do so if the functions are increasing: 


Theorem 8.2.9 (Lebesgue monotone convergence theorem) Let Q be a measurable 
subset of R", and let (f,)°°, be a sequence of non-negative measurable functions 
from Q to [0, +00] which are increasing in the sense that 


0< fi) < fox) < fp@®) <... forallx €Q. 


(Note we are assuming that f,,(x) is increasing with respect to n; this is a different 
notion from f,(x) increasing with respect to x.) Then we have 


os fasfrs[ fs 
[op temaup ff 


Q 


and 


Proof The first conclusion is clear from Proposition 8.2.6(c). Now we prove the 
second conclusion. From Proposition 8.2.6(c) again we have 


fuonet 
Q = Q 
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for every n; taking suprema inn we obtain 


[ov fn = sup | tn 
m n o 


Q 


which is one half of the desired conclusion. To finish the proof we have to show 


[7 tn < sup [fy 
o m n o 


From the definition off, sup,, fm, it will suffice to show that 


[oso ft 
Q es 


for all simple non-negative functions which minorize sup,, fm. 


Fix s. We will show that 
=e) fsssupf f 
Q "9 


for every 0 < ¢ < 1; the claim then follows by taking limits as e —> 0. 
Fix ¢. By construction of s, we have 


s(x) S sup fn(x) 


for every x € Q. Hence, for every x € & there exists an N (depending on x) such 
that 


f(x) = Cl — €)s(x). 


Since the f, are increasing, this will imply that f,(x) > (1 — €)s(x) foralln > N. 
Thus, if we define the sets E,, by 


Eyi={x € Q: fr(x) = ( — €)s(x)} 
then we have Ey C E> C E3C... and, E, = Q. 


It is not difficult to check that all the EZ, are measurable. From Proposition 
8.2.6(bce) we have 


d-9fs=fa-os[nefn 
En E, En Q 
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so to finish the argument it will suffice to show that 
sup / s= : Si 
n 
En re) 


3 i ‘ 5 * N 
Since s is a simple function, we may write s = )° 


j=1 ¢jXF; for some measurable Fj 
and positive c;. Since 


N 
[> = Y\cjm(F)) 
Q j=l 


and 
N N 
> cixRng, = So ejm(F; N En) 


j=l j=l 


Ie) 


n n 


it thus suffices to show that 


supm(F; 1 E,) = m(F;) 


for each j. But this follows from Exercise 7.2.3(a). 


This theorem is extremely useful. For instance, we can now interchange addition 
and integration: 


Lemma 8.2.10 (Interchange of addition and integration) Let Q be a measurable 
subset of R", and let f : Q — [0, co] and g: Q — [0, 00] be measurable functions. 


Then fi(f+9=fof tos: 


Proof By Lemma 8.1.5, there exists a sequence 0 < s; < s2 <--- < f of simple 
functions such that sup, s, = f, and similarly a sequence 0 < ty} <) <...< gof 
simple functions such that sup, ¢, = g. Since the s, are increasing and the f, are 
increasing, it is then easy to check that s, + f, is also increasing and sup, (Sp + th) = 
Ff + g (why?). By the monotone convergence theorem (Theorem 8.2.9) we thus have 


[fas fs 
Q "9 
[e=su fs 


Q Q 
forse = sup [ (s+). 
Q 7 Q 
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But by Proposition 8.1.10(db) we have (sn + t) = JQ 5n + JG tn. By Proposition 
8.1.9(d), Jo sn and fi, t, are both increasing in n, so 


sup i: Sat i th | = | sup / Sn | + | sup / th 
n n n 
Q Q Q 


Q 


and the claim follows. 


Of course, once one can interchange an integral with a sum of two functions, 
one can handle an integral and any finite number of functions by induction. More 
surprisingly, one can handle infinite sums as well of non-negative functions: 


Corollary 8.2.11 Jf Q is a measurable subset of R", and gy, g2,... are a sequence 
of non-negative measurable functions from Q to [0, oo], then 


Proof See Exercise 8.2.3. 


Remark 8.2.12 Note that we do not need to assume anything about the convergence 
of the above sums; it may well happen that both sides are equal to +00. However, 
we do need to assume non-negativity; see Exercise 8.3.4. 


One could similarly ask whether we could interchange limits and integrals; in 


other words, is it true that 
[im fn = lim Ee 
n—->Co n—->oo 
Q 


Q 


Unfortunately, this is not true, as the following “moving bump” example shows. For 
eachn = 1,2,3..., let f,: R > R be the function f, = Xjnn41). Then limpoo fn 
(x) = 0 for every x, but /, f, = 1 for every n, and hence lim,..0 fg fn = 1 # 0.In 
other words, the limiting function lim,_,.. f, can end up having significantly smaller 
integral than any of the original integrals. However, the following very useful lemma 
of Fatou shows that the reverse cannot happen—there is no way the limiting function 
has larger integral than the (limit of the) original integrals: 


Lemma 8.2.13 (Fatou’s lemma) Let Q be a measurable subset of R", and let 
Si. fo, ... be a sequence of non-negative functions from QQ to [0, 00]. Then 


i lim inf f, < lim inf / Sh: 
n—->oo noo 
Q 


Q 
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Proof Recall that 
lim inf f, = sup (in fn) 
noo n m>n 


and hence by the monotone convergence theorem 
lim inf f, = sup | (in fn) : 
noo n m=>n 
2 Q 
By Proposition 8.2.6(c) we have 


[ (aim) = fa 


for every j > n; taking infima in j we obtain 


Thus 
[im in inf Sn < sup it fo ae) Ta 


Q 


as desired. 


Note that we are allowing our functions to take the value +00 at some points. It is 
even possible for a function to take the value +00 but still have a finite integral; for 
instance, if E is a measure zero set, and f : & — Ris equal to +oo on E but equals 
0 everywhere else, then hes f = 0 by Proposition 8.2.6(a). However, if the integral 
is finite, the function must be finite almost everywhere: 


Lemma 8.2.14 Let Q be a measurable subset of R", and let f : Q — [0, co] be a 
non-negative measurable function such that ta f is finite. Then f is finite almost 
everywhere (i.e., the set {x € Q: f(x) = +00} has measure zero). 


Proof See Exercise 8.2.4. 


Form Corollary 8.2.11 and Lemma 8.2.14 one has a useful lemma: 


Lemma 8.2.15 (Borel—Cantelli lemma) Let Q), Q2,... be measurable subsets of 
R” such that bahar m(X2,,) is finite. Then the set 


{x € R" : x € Q, for infinitely many n} 


is a set of measure zero. In other words, almost every point belongs to only finitely 
many Q,y. 
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Proof See Exercise 8.2.5. 


— Exercise — 


Exercise 8.2.1 Prove Proposition 8.2.6. (Hint: do not attempt to mimic the proof 
of Proposition 8.1.10; rather, try to use Proposition 8.1.10 and Definition 8.2.2. For 
one direction of part (a), start with tes f = 0 and conclude that m({x € Q: f(x) > 
1/n}) = 0 for every n = 1, 2,3,..., and then use the countable subadditivity. To 
prove (e), first prove it for simple functions.) 


Exercise 8.2.2 Let Q be a measurable subset of R”, and let f : Q — [0, +00] and 
g: Q — [0, +00] be measurable functions. Without using Theorem 8.2.9 or Lemma 


8.2.10, prove that fo(f +8) > foft+Jog: 


Exercise 8.2.3 Prove Corollary 8.2.11. (Hint: use the monotone convergence theo- 
rem with fy:= yo ., gn.) 


Exercise 8.2.4 Prove Lemma 8.2.14. 


Exercise 8.2.5 Use Corollary 8.2.11 and Lemma 8.2.14 to prove Lemma 8.2.15. 
(Hint: use the indicator functions xg, .) 


Exercise 8.2.6 Let p > 2 and c > 0. Using the Borel—Cantelli lemma, show that 
the set 


{x € [0, 1]: |x - =) < — for infinitely many positive integers a, | 
q q 


has measure zero. (Hint: one only has to consider those integers a in the range 


0 <a < q (why?). Use Corollary 11.6.5 to show that the sum 4 car’ is finite.) 


Exercise 8.2.7 Call a real number x € R diophantine if there exist real numbers 
p,C > 0 such that |x — Al > C/|q|? for all nonzero integers q and all integers a. 
Using Exercise 8.2.6, show that almost every real number is diophantine. (Hint: first 
work in the interval [0, 1]. Show that one can take p and C to be rational and one 
can also take p > 2. Then use the fact that the countable union of measure zero sets 
has measure zero.) 


Exercise 8.2.8 For every positive integer n, let f,: R — [0, 00) be a non-negative 


measurable function such that i 
Sn S gna 


4n . 
R 


Show that for every ¢ > 0, there exists a set E of Lebesgue measure m(E) < é 
such that f, (x) converges pointwise to zero for all x € R\E£. (Hint: first prove that 
m({x ER: fr(x) > +}) < ae for all n = 1, 2, 3,..., and then consider the union 
of all the sets {x € R: f, (x) > a}) 
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Exercise 8.2.9 For every positive integer n, let f,,: [0,1] — [0, 00) be a non- 
negative measurable function such that f,, converges pointwise to zero. Show that 
for every ¢ > 0, there exists a set E of Lebesgue measure m(E) < e€ such that f, (x) 
converges uniformly to zero for all x € [0, 1]\E. (This is a special case of Egoroff’s 
theorem. To prove it, first show that for any positive integer m, we can find an N > 0 
such that m({x € [0, 1]: f,(x) > 1/m for alln => N}) < €/2”.) Is the claim still 
true if [0, 1] is replaced by R? 


Exercise 8.2.10 Give an example of a bounded non-negative function f: N x N > 
R™ such that ee, f (1, m) converges for every n, and such that lim,_... f(”, m) 
exists for every m, but such that 


lim Y) fn, m) ae lim f (n,m). 
m=1 


m=1 


(Hint: modify the moving bump example. It is even possible to use a function f which 
only takes the values 0 and 1.) This shows that interchanging limits and infinite sums 
can be dangerous. 


8.3 Integration of Absolutely Integrable Functions 


We have now completed the theory of the Lebesgue integral for non-negative func- 
tions. Now we consider how to integrate functions which can be both positive and 
negative. However, we do wish to avoid the indefinite expression +-oo + (—00), so 
we will restrict our attention to a subclass of measurable functions—the absolutely 
integrable functions. 


Definition 8.3.1 (Absolutely integrable functions) Let Q be a measurable subset of 
R”. A measurable function f: Q — R* is said to be absolutely integrable if the 
integral {, | f| is finite. 


Of course, | f| is always non-negative, so this definition makes sense even if f 
changes sign. Absolutely integrable functions are also known as L!(Q) functions. 

If f: Q — R* is a function, we define the positive part f*: Q— [0, co] and 
negative part f~ : 8&2 — [0, oo] by the formulae 


f*:=max(f,0);  f7:=— min(f, 0). 


From Corollary 7.5.6 (which can be extended to R*-valued functions without diffi- 
culty) we know that f+ and f~ are measurable. Observe also that ft and f~ are 
non-negative, that f = ft — f~,and|f| = ft + f~. (Why?). 


Definition 8.3.2 (Lebesgue integral) Let f: Q — R* be an absolutely integrable 
function. We define the Lebesgue integral { q J of f to be the quantity 
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rly 


Note that since f is absolutely integrable, /, q f* and fe qf are less than or equal 
to is | f| and hence are finite. Thus /, q J is always finite; we are never encountering 
the indeterminate form +-co — (+00). 

Note that this definition is consistent with our previous definition of the Lebesgue 
integral for non-negative functions, since if f is non-negative then ft = f and 
f~ = 0. We also have the useful triangle inequality 


Te <fst+frr= fin (8.1) 
Q Q Q Q 
(Exercise 8.3.1). 


Some other properties of the Lebesgue integral: 


Proposition 8.3.3. Let Q be a measurable set, and let f: Q2— Rand g:Q—>R 
be absolutely integrable functions. 


(a) For any real number c (positive, zero, or negative), we have that cf is absolutely 
integrable and des cf = ae f. 

(b) The function f + g is absolutely integrable, and [,(f + 8) = Jo f + Jas: 

(c) If f(x) < g(x) for all x € Q, then we have Ia if< te g. 

(d) If f(x) = g(x) for almost every x € Q, then Js f= da g. 


Proof See Exercise 8.3.2. 


As mentioned in the previous section, one cannot necessarily interchange limits 
and integrals, lim ff, = flim fy, as the “moving bump example” showed. How- 
ever, it is possible to exclude the moving bump example and successfully interchange 
limits and integrals, if we know that the functions f,, are all majorized by a single 
absolutely integrable function. This important theorem is known as the Lebesgue 
dominated convergence theorem and is extremely useful: 


Theorem 8.3.4 (Lebesgue dominated convergence thm) Let Q be a measurable 
subset of R", and let f,, fx, ... be a sequence of measurable functions from Q to R* 
which converge pointwise. Suppose also that there is an absolutely integrable function 
F: Q— [0, co] such that | f,(x)| < F(x) forallx € Qandalln = 1,2,3,.... Then 


[jim fa = lim Ee 
n—->Co n—->oo 
Q Q 


Proof If F was infinite on a set of positive measure then F would not be absolutely 
integrable; thus the set where F is infinite has zero measure. We may delete this set 
from Q (this does not affect any of the integrals) and thus assume without loss of 
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generality that F(x) is finite for every x € Q, which implies the same assertion for 
the f,(x). 

Let f: Q — R%* be the function f(x):=limy..5 f(x); this function exists by 
hypothesis. By Lemma 7.5.10, f is measurable. Also, since | f,(x)| < F(x) for all 
n and all x € Q, we see that each f, is absolutely integrable, and by taking limits 
we obtain | f(x)| < F(x) for all x € Q, so f is also absolutely integrable. Our task 
is to show that limy+oo Jo fn = Jo f 

The functions F + f,, are non-negative and converge pointwise to F + f. So by 
Fatou’s lemma (Lemma 8.2.13) 


ree stimint [+f 
noo 
Q Q 
and thus 
it < iia | 
n—- Oo 


But the functions F — f, are also non-negative and converge pointwise to F — f. 
So by Fatou’s lemma again 


[raf stimint [Ff 
n—->Co 
Q 


Q 


Since the right-hand side is {, F — lim sup, _,,, {o fn (why did the lim inf become 
a lim sup?), we thus have 


ae > iss f 


noo 


Thus the lim inf and lim sup of je Jn are both equal to te f, as desired. 


Finally, we record a lemma which is not particularly interesting in itself, but will 
have some useful consequences later in these notes. 


Definition 8.3.5 ((Upper and lower Lebesgue integral) Let Q be a measurable subset 
of R”, and let f: & — R be a function (not necessarily measurable). We define the 
upper Lebesgue integral Jad to be 


/ f:=inf | / g : gis an absolutely integrable function 


from Q to R that majorizes f | 


and the lower Lebesgue integral { . f to be 
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/ f= sup| : g: gis an absolutely integrable function 
Yo 
Q 


from Q to R that minorizes f |. 


It is easy to see that Le I< 7. gf (why? Use Proposition 8.3.3(c)). When f is 
absolutely integrable then ‘equality occurs (why?). The converse is also true: 


Lemma 8.3.6 Let Q be a measurable subset of R", and let f : 2 — R bea function 
(not necessarily measurable). Let A be a real number, and suppose a i — ei = 
A. Then f is absolutely integrable, and 7 


[r= fr- forma 


Proof By definition of upper Lebesgue integral, for every integer n > 1 we may find 
an absolutely integrable function f.*: Q@ — R which majorizes f such that 


1 
/ fr <At-. 
n 
Q 
Similarly we may find an absolutely integrable function f,: Q — R which 


minorizes f such that 
1 
/ {, 2Aa>— 
n 
Q 
Let F*:=inf, f* 


“+ and F~:= sup, f, . Then Ft and F7~ are measurable (by Lemma 
7.5.10) and absolutely integrable (because they are squeezed between the abso- 
lutely integrable functions f;* and f, , for instance). Also, F+ majorizes f and 
F~ minorizes f. Finally, we have 


frts| sate 
n 
Q Q 


for every n, and hence 


Similarly we have 
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but Ft majorizes F~, and hence Je Fr> = F—. Hence we must have 


(ee) Pod 
Q Q 
[rt-F =o. 
Q 


By Proposition 8.2.6(a), we thus have F*+ (x) = F7~ (x) for almost every x. But since 
f is squeezed between F~ and F*, we thus have f(x) = F*(x) = F(x) for almost 
every x. In particular, f differs from the absolutely integrable function F* only on 
a set of measure zero and is thus measurable (see Exercise 7.5.5) and absolutely 


integrable, with 
fr-[r=[ro 
Q Q Q 


In particular 


as desired. 


— Exercise — 


Exercise 8.3.1 Prove (8.1) whenever Q is a measurable subset of R” and f is an 
absolutely integrable function. 


Exercise 8.3.2 Prove Proposition 8.3.3. (Hint: for (b), break f, g, and f +g up 
into positive and negative parts, and try to write everything in terms of integrals of 
non-negative functions only, using Lemma 8.2.10.) 


Exercise 8.3.3. Let f: R > Rand g: R — Rbe absolutely integrable, measurable 
functions such that f(x) < g(x) for all x € R, and that fp f = J, g. Show that 
f(x) = g(x) for almost every x € R (ie., that f(x) = g(x) for all x € R except 
possibly for a set of measure zero). 


Exercise 8.3.4 For each n = 1,2,3,..., let f,: R— R be the function f, = 
Xtan+l) — Xtnt1n42)3 Le., let f, (x) equal +1 when x € [n,n + 1), equal —1 when 
x €[n+1,n-+ 2), and 0 everywhere else. Show that 


[Xned |p 


R n=1 n=1 R 


Explain why this does not contradict Corollary 8.2.11. 
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8.4 Comparison with the Riemann Integral 


We have spent a lot of effort constructing the Lebesgue integral, but have not yet 
addressed the question of how to actually compute any Lebesgue integrals, and 
whether Lebesgue integration is any different from the Riemann integral (say for 
integrals in one dimension). Now we show that the Lebesgue integral is a generaliza- 
tion of the Riemann integral. To clarify the following discussion, we shall temporarily 
distinguish the Riemann integral from the Lebesgue integral by writing the Riemann 


integral [, f as R. f, f. 
Our objective here is to prove 


Proposition 8.4.1 Let I C R be a bounded interval, and let f: I > R be a Rie- 
mann integrable function. Then f is also absolutely integrable, and f, pt=R. J, if 


Proof Write A:=R. [, , /- Since f is Riemann integrable, we know that the upper 
and lower Riemann integrals are equal to A. Thus, for every ¢ > 0, there exists a 
partition P of J into smaller intervals J such that 


ais On hs 2 Wiens 
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where |J/| denotes the length of J. Note that |/| is the same as m(J), since J is a 
box. 
Let f- : 1 > Rand f* : I > R be the functions 


fe (x) = Dink f@)xs@) 


JeP 
and 
fo (x) = So sup f(x) x7): 


Jee xeJ 


these are simple functions and hence measurable and absolutely integrable. By 
Lemma 8.1.9 we have 


fo = QU lJlinf £0) 
JeP 


I 


and 
JeP 


A-cs| f <A sf iisare 


I 


ie = Doli sup £2) 
T 


and hence 
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Since f,* majorizes f, and f> minorizes f, we thus have 


A-es| ps fisate 
1 7 


for every €, and thus 


and hence by Lemma 8.3.6, f is absolutely integrable with /, , f =A, as desired. 


Thus every Riemann integrable function is also Lebesgue integrable, at least on 
bounded intervals, and we no longer need the R. , f notation. However, the converse 
is not true. Take for instance the function f: [0, 1] — R defined by f(x):=1 when 
x is rational, and f(x):=0 when x is irrational. Then from Proposition 11.7.1 we 
know that f is not Riemann integrable. On the other hand, f is the characteristic 
function of the set QM [0, 1], which is countable and hence measure zero. Thus f is 
Lebesgue integrable and Soi} f =. Thus the Lebesgue integral can handle more 
functions than the Riemann integral; this is one of the primary reasons why we use 
the Lebesgue integral in analysis. (The other reason is that the Lebesgue integral 
interacts well with limits, as the Lebesgue monotone convergence theorem, Fatou’s 
lemma, and Lebesgue dominated convergence theorem already attest. There are no 
comparable theorems for the Riemann integral.) 


8.5 Fubini’s Theorem 


In one dimension we have shown that the Lebesgue integral is connected to the 
Riemann integral. Now we will try to understand the connection in higher dimensions. 
To simplify the discussion we shall just study two-dimensional integrals, although 
the arguments we present here can easily be extended to higher dimensions. 

We shall study integrals of the form te J. Note that once we know how to integrate 
on R?, we can integrate on measurable subsets Q of R?, since he f can be rewritten 
as feo f Xa. 

Let f(x, y) be a function of two variables. In principle, we have three dif- 
ferent ways to integrate f on R?. First of all, we can use the two-dimensional 
Lebesgue integral, to obtain ei jf. Secondly, we can fix x and compute a one- 
dimensional integral in y, and then take that quantity and integrate in x, thus obtaining 
int (= f(x, y) dy) dx. Thirdly, we could fix y and integrate in x, and then integrate 
in y, thus obtaining fp (J, f(x, y) dx) dy. 

Fortunately, if the function f is absolutely integrable on f, then all three integrals 
are equal: 
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Theorem 8.5.1 (Fubini’s theorem) Let f: R? > R be an absolutely integrable 
function. Then there exists absolutely integrable functions F: R > RandG: R > 
R such that for almost every x, f (x, y) is absolutely integrable in y with 


F(x) = i fle. y) dy, 
R 


and for almost every y, f (x, y) is absolutely integrable in x with 


Go) = f Fey ie 
R 


[reas freq [coray. 
R2 


R R 


Finally, we have 


Remark 8.5.2. Very roughly speaking, Fubini’s theorem says that 


[[f tena ars fre ff rear )ar. 
R2 R R 


R R 


This allows us to compute two-dimensional integrals by splitting them into two one- 
dimensional integrals. The reason why we do not write Fubini’s theorem this way, 
though, is that it is possible that the integral Se J (x, y) dy does not actually exist 
for every x, and similarly she J (x, y) dx does not exist for every y; Fubini’s theorem 
only asserts that these integrals only exist for almost every x and y. For instance, 
if f(x, y) is the function which equals | when y > 0 and x = 0, equals —1 when 
y <0 and x = 0, and is zero otherwise, then f is absolutely integrable on R? and 
ce f =0 (since f equals zero almost everywhere in R7), but ile Ff (, y) dy is not 
absolutely integrable when x = 0 (though it is absolutely integrable for every other 
x): 


Proof The proof of Fubini’s theorem is quite complicated, and we will only give a 
sketch here. We begin with a series of reductions. 
Roughly speaking (ignoring issues relating to sets of measure zero), we have to 


show that 
i [ fe. a= ff 
R2 


R R 


together with a similar equality with x and y reversed. We shall just prove the above 
equality, as the other one is very similar. 


8.5 Fubini’s Theorem 187 


First of all, it suffices to prove the theorem for non-negative functions, since the 
general case then follows by writing a general function f asa difference f+ — f~ of 
two non-negative functions, and applying Fubini’s theorem to f* and f~ separately 
(and using Proposition 8.3.3(a) and (b)). Thus we will henceforth assume that f is 
non-negative. 

Next, it suffices to prove the theorem for non-negative functions f supported on 
a bounded set such as [—N, N] x [—N, N] for some positive integer N. Indeed, 
once one obtains Fubini’s theorem for such functions, one can then write a general 
function f as the supremum of such compactly supported functions as 


f = sup fX[-N.N]x[-N.N] 
N>0O 


apply Fubini’s theorem to each function f x;~,~1x{-N,n] Separately, and then take 
suprema using the monotone convergence theorem. Thus we will henceforth assume 
that f is supported on [—N, N] x [—N, N]. 

By another similar argument, it suffices to prove the theorem for non-negative 
simple functions supported on[—N, N] x [—N, N], since one can use Lemma 8.1.5 
to write f as the supremum of simple functions (which must also be supported on 
[—N, N]), apply Fubini’s theorem to each simple function, and then take suprema 
using the monotone convergence theorem. Thus we may assume that f is a non- 
negative simple function supported on [—N, N] x [—N, N]. 

Next, we see that it suffices to prove the theorem for characteristic functions 
supported in [—N, N] x [—N, N]. This is because every simple function is a linear 
combination of characteristic functions, and so we can deduce Fubini’s theorem for 
simple functions from Fubini’s theorem for characteristic functions. Thus we may 
take f = xz for some measurable E C [—N, N] x [—N, N]. Our task is then to 
show (ignoring sets of measure zero) that 


/ / XeE(x, y) dy | dx = m(E). 


[-N,N] [-N,N] 


It will suffice to show the upper Lebesgue integral estimate 


/ J ee.say dx < m(E£). (8.2) 


[-N,N] [-N,N] 


We will prove this estimate later. Once we show this for every set E, we may substitute 
E with [—N, N] x [—N, N]\E and obtain 


/ / (1 — xe(x, y)) dy | dx < 4N? — m(E). 


[-N,N]_ \[-N,N] 
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But the left-hand side is equal to 

(2N — / XE(X, y) dy) dx 

[-N,N] SENN 


which is in turn equal to 


an?— f (| xE(x, y) ) dx 
Y [-N,N] Y [-N,N] 


and thus we have 


; ( : XE, y) os) dx > m(E). 
“{-N,N] \ S1-N,N] 


In particular we have 


/ / Xe(x, y) dy | dx = m(E) 
/ [N,N] 


-N.N] 


and hence by Lemma 8.3.6 we see that i. N.N|XE (x, y) dy is absolutely integrable 
and 


XeE(x, y) dy | dx = m(E). 
[-N,N] \-N,N] 


A similar argument shows that 
(/ XE(%, Y) ) dx = m(E) 
wy) “IHN. 
and hence 
/ / xXeE(%, y) dy -| xe(x, y) | dx =0. 
[-N,N] \Y-N,N] “INN ] 
Thus by Proposition 8.2.6(a) we have 


i Xe, y) dy = [ we.svay 
—1-N,N] [-N,N] 
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for almost every x € [—N, N]. Thus xz(x, y) is absolutely integrable in y for almost 
every x, and ie N,N] XE (x, y) is thus equal (almost everywhere) to a function F(x) 
such that 


F(x) dx = m(E) 
[-N,N] 
as desired. 
It remains to prove the bound (8.2). Let ¢ > 0 be arbitrary. Since m(E) is the 


same as the outer measure m*(F), we know that there exists an at most countable 


collection (B;) j<y of boxes such that E C Ujes B; and 


Y\ m(Bj) < m(E) +e. 


jes 


Each box B; can be written as Bj = I; x I i for some intervals J; and I i Observe 


that 
m(B) =I = f ujiax= f | f ay} as 
iF 1 


ij I; 


af J cannes. vyar dy 


[-N.N] \[-N.N] 
a ; XB, (x, y) dx | dy. 
[-N.N] \[-N.N] 


Adding this over all 7 € J (using Corollary 8.2.11) we obtain 


S > m(B;) = / f Yree. ras dy. 


jet [-N,N] [-N,N] jel 
In particular we have 


/ i Y- xe, (x, y) dx dy < m(E) +6. 
[-N,N] 
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But Duser Xe; Majorizes x~ (why?) and thus 


Xe(x, y) dx | dy < m(E) +e. 


[-N,N] [-N,.N] 


But « is arbitrary, and so we have (8.2) as desired. This completes the proof of 
Fubini’s theorem. 
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